You have a piece of writing in front of you and a quiet suspicion it was not written by the person who handed it over. Maybe a student essay. Maybe a freelancer's deliverable. Maybe a cover letter that reads a touch too polished. This guide is for the person doing the checking, not the person writing. It walks the five-step verification method used by teachers, editors, and recruiters: paste into a detector, read the score honestly, examine sentence-level highlights, check perplexity and burstiness signals, and cross-verify with a second classifier before acting. The goal is a defensible verdict, not a gotcha.
The biggest mistake people make when checking text is running it through a single detector, looking at the headline number, and treating that as final. The honest range for the best classifiers in 2026 sits between 92 and 97 percent on long-form ChatGPT output, dropping into the low 80s on short passages and rewritten text.
Turnitin leans on paragraph structure and burstiness. GPTZero is built around perplexity. Originality.ai trained heavily on SEO content. TextSight emphasises sentence-level patterns and vocabulary tells. The same passage can score 22 percent on one tool and 78 percent on another, and both detectors are doing their job correctly because they are measuring different things. A single score is a snapshot of one method, not a verdict.
The workable rule, used by university integrity offices and editors who do this for a living, is to pick two detectors with independent methods and treat agreement as the signal. If both say AI, trust it. If both say human, trust it. If they disagree, the verdict is inconclusive and you should not act on the score alone. That principle drives the five-step method below.
Automated detection is best for screening at scale and worst for one-shot verdicts. Run it across a batch of submissions to surface cases worth examining closely. Then read the flagged ones carefully, examine sentence-level highlights, and have a conversation with the author. Detection is triage; the conversation is the verdict.
The same five-step method serves three very different jobs. The signals are the same; the threshold for acting and the next conversation are different.
Professors and high-school teachers reviewing essays that read a little too clean. The five-step method gives them sentence-level evidence to bring into an academic-integrity conversation, rather than a single confidence number that an articulate student can argue against. The highlight map is what makes the conversation defensible; it points to specific sentences and explains why each one flagged, so the discussion is about evidence rather than gut feeling.
Editors and content leads checking work from contractors before publishing or paying. The detection step here is mostly about catching undisclosed AI-only drafts that were not edited at all. A 78 percent score with the entire piece glowing red is a different finding from a 35 percent score concentrated in two introductory paragraphs. The first is a delivery problem; the second is normal AI-assisted craft.
Hiring teams checking cover letters and writing samples. The verdict here is rarely "reject for AI use"; it is usually "treat this writing sample as low signal and lean on the live interview." A ChatGPT-clean cover letter does not tell you the applicant cannot write. It tells you the sample is not worth weighting heavily in your decision, and the structured interview should compensate.
Roughly five minutes per text once you know the workflow. Free for the first three scans a day, no signup needed. The sequence matters; running the steps out of order leads back to single-number verdicts.
Open TextSight and paste the passage into the scan box, or upload the document directly. No account is needed for the first three checks each day. Longer passages give the classifier more signal to work with, so paste the full piece rather than a single paragraph when you can. If the text runs above the free word limit, paste the longest contiguous section rather than splicing fragments together; coherence matters to perplexity and burstiness.
You get back a 0 to 100 score that represents the classifier's confidence the passage reads as AI-generated. Treat this as a confidence number, not a verdict. Under 20 is comfortably human. 20 to 50 is mostly human with some AI-adjacent passages, common in lightly edited drafts. 50 to 75 is contested territory where structure or vocabulary is raising flags but the text could still be a human writer with a structured style. Above 75 is high confidence the text is AI-generated.
Open the highlight map and read which specific sentences glow red. The pattern of the highlights matters more than the headline number. Eight clustered red sentences in one paragraph tells a different story from eight red sentences scattered across the piece. Clustered red is usually AI text that was lightly polished on top; scattered red is often a structured human writer who happens to overlap with AI patterns in places. The highlight evidence is what you bring into the conversation that follows.
Open the detail panel and read the perplexity and burstiness numbers. Perplexity measures how predictable each word is given the words around it. AI text tends to have low perplexity because the model picks high-probability next words, which produces a smooth flow. Burstiness measures sentence-to-sentence variance. AI text tends to be uniform, with every sentence sitting in the 16-to-22 word range. Human writing tends to be bursty, with short punchy sentences sitting next to long extended ones. Low perplexity with low burstiness is the strongest AI signal; high burstiness with mixed perplexity is closer to natural human writing even when the headline score is elevated.
Paste the same passage into a second classifier that uses a different method, such as GPTZero free, which is perplexity-led. If TextSight returns 78 percent AI and GPTZero returns 71 percent AI, you have two independent tools in agreement and the verdict is solid. If TextSight returns 78 percent and GPTZero returns 22 percent, the result is contested and you should not act on a single number. Two independent classifiers in agreement is the closest you get to a defensible verdict from tools alone.
Free includes 3 detector scans a day with sentence-level highlights and no signup. Paid tiers raise the quotas, lift the daily caps, and add file upload, the Chrome extension, and REST API access. Yearly billing saves 25%.
Billed $89.88/year — Save $30
Billed $179.88/year — Save $60
Billed $359.88/year — Save $120
Yearly billing saves 25%. View full pricing
A careful reader can flag obvious AI in 60 seconds without software. These patterns are not proof on their own, but two or more together in a short passage is when the detector step becomes worth the time.
Three adjectives stacked in front of a single noun is a strong AI tell. "A robust, comprehensive, multifaceted approach" reads AI. "A robust approach" reads human, and so does any sentence where the noun does the work without the stack. Two or three tripled-adjective constructions per page is normal on a ChatGPT-assisted draft and almost never appears in unassisted writing.
Watch for stacked transitions across paragraph boundaries: Furthermore, Moreover, In addition, Additionally, In conclusion. ChatGPT defaults to these at the start of body paragraphs the way humans rarely do; human writers usually trust the paragraph break itself to do the work. Five paragraphs in a row that open with a generic transition is a templated structure, not a stylistic choice.
A short list of words appears at roughly five to seven times their normal rate in ChatGPT prose: delve, tapestry, robust, leverage (as a verb), navigate (as metaphor), underscore, showcase, myriad, multifaceted, foster. Two or three of these in a 500-word passage is unusual. Five or more is a near-certainty. Most undergraduate writers and most working copywriters use one or zero in a typical piece.
Three patterns where a high detector score is more likely to be wrong than right. Weight the flag much more cautiously when any of these apply.
Multiple peer-reviewed studies have shown that AI detectors flag essays from English-as-a-second-language writers as AI 3 to 5 times more often than essays from native US writers. The reason is structural: learned-second-language English tends to use more uniform sentence shapes, more standard vocabulary, and fewer idioms, which overlap with AI patterns. TextSight tunes its threshold roughly 40 percent lower for ESL writers than US-only competitors, but no detector eliminates the risk completely. If the writer's first language is not English, weight the flag much more cautiously and lean on the conversation step rather than the score.
Legal memos, technical documentation, scientific abstracts, formal business briefs, and clinical case reports all genuinely read like AI because the genre conventions demand uniform structure, formal vocabulary, and predictable paragraph shapes. Detectors trained on creative writing flag this kind of prose regularly. A 78 percent AI score on a clean clinical case report is a known failure mode, not a verdict on the author.
Under 300 words, all detectors lose reliability. Below 150 words, the score is closer to a coin flip than a verdict, because the signals detectors rely on (burstiness, vocabulary patterns, structural shape) need enough text to be measurable. Never act on a high score from a short passage alone; gather more samples from the same writer before drawing a conclusion.
The sentence-level scanner that powers steps 1 to 4 of the verification method.
Open the detectorClassroom workflow, academic-integrity conversation scripts, and FERPA-aware deployment.
Read the guideHow the two classifiers differ in method and where each one is the right cross-verify partner.
Read the comparisonHow the 0-to-100 score is computed and how to read it alongside perplexity and burstiness.
Read the guideFree to try, no card. 3 detector scans a day, sentence-level highlights, perplexity and burstiness signals on every result.