HomeGuides › How to Check If Text Is AI

How to check if text is AI — a 5-step verification method that actually holds up.

You have a piece of writing in front of you and a quiet suspicion it was not written by the person who handed it over. Maybe a student essay. Maybe a freelancer's deliverable. Maybe a cover letter that reads a touch too polished. This guide is for the person doing the checking, not the person writing. It walks the five-step verification method used by teachers, editors, and recruiters: paste into a detector, read the score honestly, examine sentence-level highlights, check perplexity and burstiness signals, and cross-verify with a second classifier before acting. The goal is a defensible verdict, not a gotcha.

Check text free Skip to the 5 steps
5-step verification ~5 minutes per text ESL false-positive aware
Read this first

Why one tool is never enough.

The biggest mistake people make when checking text is running it through a single detector, looking at the headline number, and treating that as final. The honest range for the best classifiers in 2026 sits between 92 and 97 percent on long-form ChatGPT output, dropping into the low 80s on short passages and rewritten text.

Different detectors weight different signals

Turnitin leans on paragraph structure and burstiness. GPTZero is built around perplexity. Originality.ai trained heavily on SEO content. TextSight emphasises sentence-level patterns and vocabulary tells. The same passage can score 22 percent on one tool and 78 percent on another, and both detectors are doing their job correctly because they are measuring different things. A single score is a snapshot of one method, not a verdict.

Agreement between two detectors is the signal

The workable rule, used by university integrity offices and editors who do this for a living, is to pick two detectors with independent methods and treat agreement as the signal. If both say AI, trust it. If both say human, trust it. If they disagree, the verdict is inconclusive and you should not act on the score alone. That principle drives the five-step method below.

Screening versus verdicts

Automated detection is best for screening at scale and worst for one-shot verdicts. Run it across a batch of submissions to surface cases worth examining closely. Then read the flagged ones carefully, examine sentence-level highlights, and have a conversation with the author. Detection is triage; the conversation is the verdict.

Three audiences

Who checks text for AI origin.

The same five-step method serves three very different jobs. The signals are the same; the threshold for acting and the next conversation are different.

Teachers grading student work

Professors and high-school teachers reviewing essays that read a little too clean. The five-step method gives them sentence-level evidence to bring into an academic-integrity conversation, rather than a single confidence number that an articulate student can argue against. The highlight map is what makes the conversation defensible; it points to specific sentences and explains why each one flagged, so the discussion is about evidence rather than gut feeling.

Content managers reviewing freelance deliverables

Editors and content leads checking work from contractors before publishing or paying. The detection step here is mostly about catching undisclosed AI-only drafts that were not edited at all. A 78 percent score with the entire piece glowing red is a different finding from a 35 percent score concentrated in two introductory paragraphs. The first is a delivery problem; the second is normal AI-assisted craft.

Recruiters screening applicant materials

Hiring teams checking cover letters and writing samples. The verdict here is rarely "reject for AI use"; it is usually "treat this writing sample as low signal and lean on the live interview." A ChatGPT-clean cover letter does not tell you the applicant cannot write. It tells you the sample is not worth weighting heavily in your decision, and the structured interview should compensate.

The 5-step method

Paste, score, highlights, signals, cross-verify.

Roughly five minutes per text once you know the workflow. Free for the first three scans a day, no signup needed. The sequence matters; running the steps out of order leads back to single-number verdicts.

Step 1: Paste or upload the text into the detector

Open TextSight and paste the passage into the scan box, or upload the document directly. No account is needed for the first three checks each day. Longer passages give the classifier more signal to work with, so paste the full piece rather than a single paragraph when you can. If the text runs above the free word limit, paste the longest contiguous section rather than splicing fragments together; coherence matters to perplexity and burstiness.

Step 2: Read the overall AI versus human score

You get back a 0 to 100 score that represents the classifier's confidence the passage reads as AI-generated. Treat this as a confidence number, not a verdict. Under 20 is comfortably human. 20 to 50 is mostly human with some AI-adjacent passages, common in lightly edited drafts. 50 to 75 is contested territory where structure or vocabulary is raising flags but the text could still be a human writer with a structured style. Above 75 is high confidence the text is AI-generated.

Step 3: Examine the sentence-level highlights

Open the highlight map and read which specific sentences glow red. The pattern of the highlights matters more than the headline number. Eight clustered red sentences in one paragraph tells a different story from eight red sentences scattered across the piece. Clustered red is usually AI text that was lightly polished on top; scattered red is often a structured human writer who happens to overlap with AI patterns in places. The highlight evidence is what you bring into the conversation that follows.

Step 4: Check perplexity and burstiness signals

Open the detail panel and read the perplexity and burstiness numbers. Perplexity measures how predictable each word is given the words around it. AI text tends to have low perplexity because the model picks high-probability next words, which produces a smooth flow. Burstiness measures sentence-to-sentence variance. AI text tends to be uniform, with every sentence sitting in the 16-to-22 word range. Human writing tends to be bursty, with short punchy sentences sitting next to long extended ones. Low perplexity with low burstiness is the strongest AI signal; high burstiness with mixed perplexity is closer to natural human writing even when the headline score is elevated.

Step 5: Cross-verify with a second independent detector

Paste the same passage into a second classifier that uses a different method, such as GPTZero free, which is perplexity-led. If TextSight returns 78 percent AI and GPTZero returns 71 percent AI, you have two independent tools in agreement and the verdict is solid. If TextSight returns 78 percent and GPTZero returns 22 percent, the result is contested and you should not act on a single number. Two independent classifiers in agreement is the closest you get to a defensible verdict from tools alone.

Plans & pricing

Detector and highlights on every tier.

Free includes 3 detector scans a day with sentence-level highlights and no signup. Paid tiers raise the quotas, lift the daily caps, and add file upload, the Chrome extension, and REST API access. Yearly billing saves 25%.

Free
$0/forever

 

Check a few pieces a day. No card.
  • 3 detector scans/day
  • Sentence-level highlights
  • Perplexity & burstiness
  • Plagiarism Risk bundled
Start free
Starter
$7.49/month

Billed $89.88/year — Save $30

For light reviewers and educators.
  • Unlimited detector scans
  • 20,000 AI rewriter words/mo
  • Chrome extension
  • Email support
Get Starter
Business
$29.99/month

Billed $359.88/year — Save $120

For departments and content teams.
  • 150,000 AI rewriter words/mo
  • REST API access
  • 5 team seats
  • Webhook integrations
Get Business

Yearly billing saves 25%. View full pricing

Without any tool

Three patterns you can spot by reading carefully.

A careful reader can flag obvious AI in 60 seconds without software. These patterns are not proof on their own, but two or more together in a short passage is when the detector step becomes worth the time.

Tripled adjectives

Three adjectives stacked in front of a single noun is a strong AI tell. "A robust, comprehensive, multifaceted approach" reads AI. "A robust approach" reads human, and so does any sentence where the noun does the work without the stack. Two or three tripled-adjective constructions per page is normal on a ChatGPT-assisted draft and almost never appears in unassisted writing.

Transition phrase clustering

Watch for stacked transitions across paragraph boundaries: Furthermore, Moreover, In addition, Additionally, In conclusion. ChatGPT defaults to these at the start of body paragraphs the way humans rarely do; human writers usually trust the paragraph break itself to do the work. Five paragraphs in a row that open with a generic transition is a templated structure, not a stylistic choice.

"Delve" and "tapestry" vocabulary

A short list of words appears at roughly five to seven times their normal rate in ChatGPT prose: delve, tapestry, robust, leverage (as a verb), navigate (as metaphor), underscore, showcase, myriad, multifaceted, foster. Two or three of these in a 500-word passage is unusual. Five or more is a near-certainty. Most undergraduate writers and most working copywriters use one or zero in a typical piece.

When the flag is wrong

False-positive traps every checker should know.

Three patterns where a high detector score is more likely to be wrong than right. Weight the flag much more cautiously when any of these apply.

ESL writers

Multiple peer-reviewed studies have shown that AI detectors flag essays from English-as-a-second-language writers as AI 3 to 5 times more often than essays from native US writers. The reason is structural: learned-second-language English tends to use more uniform sentence shapes, more standard vocabulary, and fewer idioms, which overlap with AI patterns. TextSight tunes its threshold roughly 40 percent lower for ESL writers than US-only competitors, but no detector eliminates the risk completely. If the writer's first language is not English, weight the flag much more cautiously and lean on the conversation step rather than the score.

Highly structured prose

Legal memos, technical documentation, scientific abstracts, formal business briefs, and clinical case reports all genuinely read like AI because the genre conventions demand uniform structure, formal vocabulary, and predictable paragraph shapes. Detectors trained on creative writing flag this kind of prose regularly. A 78 percent AI score on a clean clinical case report is a known failure mode, not a verdict on the author.

Short passages

Under 300 words, all detectors lose reliability. Below 150 words, the score is closer to a coin flip than a verdict, because the signals detectors rely on (burstiness, vocabulary patterns, structural shape) need enough text to be measurable. Never act on a high score from a short passage alone; gather more samples from the same writer before drawing a conclusion.

FAQ

Checking text for AI frequently asked.

Can I tell if text is AI just by reading it?
Sometimes, in obvious cases. Watch for tripled adjectives, transition phrase clustering (Furthermore, Moreover, In addition), and a small bank of telltale vocabulary (delve, tapestry, robust, leverage as a verb). Two or more of these patterns in a short passage is a meaningful signal. But the eye misses subtle cases, and naturally structured human writers get false-flagged regularly. Always cross-verify with a tool before acting on a reading impression.
What does an 85 percent AI score actually mean?
It means the detector is 85 percent confident the passage reads as AI-generated, based on the patterns it was trained on. It is not a claim that 85 percent of the words came from a model. Confidence is not certainty. A high score raises a question; it does not answer it on its own. Always read the sentence-level highlights alongside the headline number.
Why should I cross-verify with a second detector?
Different detectors weight different signals. TextSight emphasises sentence-level patterns and vocabulary tells. GPTZero is built around perplexity. Originality.ai trained heavily on SEO content. The same passage can score 22 percent on one tool and 78 percent on another, and both detectors are doing their job correctly because they are measuring different things. Two independent classifiers in agreement is the closest you get to a defensible verdict.
Are ESL writers more likely to be flagged?
Yes. Multiple studies have shown that ESL writers get flagged as AI 3 to 5 times more often than native US writers, because learned-second-language English tends to use more uniform sentence structures that overlap with AI patterns. TextSight tunes its threshold roughly 40 percent lower for ESL prose than US-only competitors, but no detector eliminates the risk completely. Weight the flag more cautiously when the writer's first language is not English.
When is automated detection most useful?
Automated detection is best for screening at scale and worst for one-shot verdicts. Run it across a batch of submissions to surface the cases worth looking at closely. Then read the flagged ones carefully, examine the sentence-level highlights, and have a conversation with the author. Detection is a triage tool, not a judge. Treating a single high score as a verdict is where most false-positive harm happens.
What is perplexity and why does it matter?
Perplexity measures how predictable each word is given the words around it. AI text tends to have low perplexity because the model picks high-probability next words, which produces a smooth and predictable flow. Human writers reach for unexpected words more often. Burstiness measures sentence-to-sentence variance in length and complexity. AI text tends to be uniform; human writing tends to be bursty, with short punchy sentences sitting next to long extended ones.
If a detector flags a student or job applicant, what do I do?
Treat the flag as the start of a conversation, not the verdict. Bring the sentence-level highlight evidence into the conversation. Ask the writer to walk through how they wrote a specific flagged paragraph. A genuine writer can answer easily because they remember the process. Someone who pasted from ChatGPT struggles to reconstruct work that never happened. The conversation tells you more than the score does.
How long does the full 5-step check take?
About five minutes per text once you are familiar with the workflow. Step 1 (paste) is ten seconds. Step 2 (score) is thirty seconds of reading. Step 3 (highlights) is the longest piece, usually two minutes of skimming the red sentences. Step 4 (perplexity and burstiness) is another thirty seconds. Step 5 (cross-verify) adds about a minute. The conversation that comes after a confirmed flag is separate and takes as long as it takes.
Related

More for the checking workflow.

Paste, score, highlight, verify. In five minutes.

Free to try, no card. 3 detector scans a day, sentence-level highlights, perplexity and burstiness signals on every result.

Check text free See pricing
Built for screening, not one-shot verdicts. Evidence first, conversation next.