Every piece of text has a statistical shape. A fingerprint, if you want to get dramatic about it. Human writing and AI writing leave different fingerprints — and that difference is exactly what detection tools are trained to find.
Here's the thing: most people assume AI detectors work like plagiarism checkers, comparing your text to a database of known AI outputs. They don't. They're doing something more abstract, and honestly more interesting. They're measuring probability.
This post breaks down what that actually means — in plain English, not a machine learning textbook.
What "Perplexity" Actually Means (And Why It Matters)
The word "perplexity" gets thrown around constantly in AI detection circles. It sounds technical. It isn't, once you strip the jargon away.
Perplexity measures how surprised a language model is by a sequence of words. High perplexity means the model didn't expect those words in that order. Low perplexity means the text was predictable — exactly what the model would have guessed.
Think about it this way. If I ask you to finish this sentence — "The stock market closed at a record..." — you'd probably guess "high" or "level." That's low perplexity. The next word is obvious. Now imagine the actual next word was "sneeze." You're surprised. That's high perplexity.
ChatGPT, Claude, Gemini — they all work by picking the next most probable token. They're literally optimized to produce low-perplexity text. That's what makes their output fluent and readable. It also makes their output statistically predictable in a way human writing usually isn't.
Human writers are surprising. We use unexpected word choices, change direction mid-sentence, and occasionally write things that feel almost wrong but aren't. AI doesn't do that naturally.
Burstiness: The Rhythm Nobody Talks About
Perplexity gets all the attention. Burstiness doesn't — which is a shame, because it's arguably easier to understand.
Burstiness describes the variance in sentence length across a passage. Human writers naturally burst: we write a long, winding sentence packed with qualifications and sub-clauses, and then a short one. Then another long one. Then three short sentences in a row. We don't keep a steady tempo.
AI text tends to be metronomic. Every sentence lands somewhere between 18 and 26 words. The rhythm is almost musical — and not in a good way. It reads like someone who learned to write by averaging everything out.
Here's a practical illustration:
Low burstiness (AI-typical):
"Artificial intelligence has become increasingly prevalent in academic settings. Students are using AI tools to assist with their writing. This has created challenges for educators who must evaluate student work. Many institutions have implemented policies to address this issue."
Four sentences. 12 words. 14 words. 18 words. 14 words. That's metronomic.
High burstiness (human-typical):
"AI is everywhere in academia now — that much is obvious. What's less obvious is how badly equipped most institutions were to handle it, because nobody saw the adoption rate coming, not even the researchers building these systems in the first place. Too slow. Too reactive. And now here we are."
Short, long, very short, very short, three words. That's burstiness in action.
Detectors measure the standard deviation of sentence lengths. A human passage typically shows a standard deviation of 8–14 words. AI output clusters closer to 4–6. That single metric — just sentence length variance — is surprisingly predictive.
Vocabulary Distribution: The Words That Give You Away
Language models have vocabulary preferences. They learn these from training data, and they show up as statistical patterns in output.
Words like "delve," "showcase," "notably," "pivotal," "transformative," "underscore," "fostering," and "leverage" appear in AI writing at rates far above their frequency in human text. This isn't because AI is bad at writing — it's because these words appear disproportionately in the formal, edited text that makes up much of the training data.
GPT-4o's vocabulary fingerprint is slightly different from Claude's, which is slightly different from Gemini's. But they all share this tendency toward certain formal, polished-sounding terms. A trained detector can actually identify which model produced a given passage with 70–80% accuracy based on vocabulary fingerprint alone.
Here's a simplified table showing how perplexity and burstiness vary between passage types:
Passage Type | Typical Perplexity | Burstiness (Sentence SD)
--------------------------------------|--------------------|-------------------------
GPT-4o response (default) | 12–22 | Low (~4–6 words)
Claude 3.5 Sonnet response | 15–25 | Low-Medium (~5–8 words)
Academic journal article (human) | 28–45 | Low (~4–7 words)
Blog post / journalism (human) | 40–70 | High (~9–15 words)
ESL student essay (human) | 20–35 | Low (~4–8 words)
Casual email (human) | 55–90 | Very High (~12–18 words)
Notice something uncomfortable in that table. Academic journal articles — written by humans — share a perplexity and burstiness profile closer to AI output than to casual human writing. This is not a coincidence. It's a real problem, and we'll get to it.
Why Some Human Writing Accidentally Looks Like AI
Here's a pattern that surprises a lot of people: highly formal human writing often triggers AI detectors.
Academic prose is written to sound authoritative, impersonal, and precise. It favors passive voice, consistent sentence structures, formal vocabulary, and tightly controlled paragraph lengths. These are exactly the stylistic features that detectors associate with AI.
ESL (English as a Second Language) writers are hit especially hard. A 2024 Stanford HAI study found that 61.3% of essays written by TOEFL test-takers — real humans demonstrating English proficiency — were flagged as AI by at least one major detection tool. The reason: when you're writing in a second language at a formal register, you naturally avoid the kind of surprising, idiomatic variation that reads as authentically human.
Similarly, writing that's been heavily grammar-edited tends to lose the irregularities that signal human origin. Every time an editor smooths out a long sentence, regularizes punctuation, or replaces an unusual word with a conventional one, they're lowering perplexity and flattening burstiness. Ironically, polishing writing too aggressively can make it look more AI-generated.
This is why AI detection is genuinely hard, and why a single score shouldn't be treated as a verdict.
Structural Predictability: The Third Layer of the Fingerprint
Beyond perplexity and burstiness, there's a third dimension: structural predictability.
AI models have learned that good essays have introductions, body paragraphs with topic sentences, and conclusions. They've absorbed thousands of writing guides and model essays. The result is writing that's almost aggressively well-organized — every paragraph opens with its main idea, transitions are explicit and logical, conclusions summarize what came before.
Humans don't write like that in practice. We go off on tangents. We put the interesting detail in the middle of a paragraph instead of at the top. We sometimes end a section abruptly because we ran out of things to say. We occasionally repeat ourselves without realizing it. These "flaws" are actually statistical signals that the writing is human.
Detectors trained on large corpora learn to recognize structural templates. An essay that opens with a two-sentence hook, followed by a thesis statement, followed by three body paragraphs each starting with a transition phrase — that's a template fingerprint. It correlates heavily with AI production, even if the actual text isn't flagged on individual phrases.
How TextSight Reads the Fingerprint Differently
Most AI detectors output a binary: AI or human. Some give a percentage. Very few tell you why they scored what they scored.
TextSight gives you a Humanization Score from 0 to 100 — 0 meaning the text reads as clearly AI-generated, 100 meaning it reads strongly human. But the more useful feature is the AI Vocabulary Highlighter, which flags the specific phrases pulling your score down.
That's the difference between "your essay scored 38/100" and "these 14 specific phrases are creating an AI fingerprint in your text." The second one you can actually do something with.
The score thresholds work like this:
- 0–40: High risk. Most detectors will flag this text.
- 41–60: Grey zone. Some detectors flag it, some don't. Inconsistent results.
- 61–74: Lower risk. Passes many detectors, fails some stricter ones.
- 75–84: Passes most commercial detectors in typical use.
- 85–100: Reads strongly human. Very low flag rate.
TextSight runs perplexity analysis, burstiness measurement, and vocabulary distribution checks simultaneously. The score weights all three. That's why two pieces of text with similar average perplexity can score differently — one might have excellent burstiness that compensates, while the other is metronomic even if individual sentences are surprising.
What Detectors Can't See
Let's be honest about the limitations.
No current AI detector is perfect. They're probabilistic tools making statistical inferences, not forensic instruments. A human can write text that scores 20/100 on every detector. An AI can produce text that scores 90/100.
What detectors handle badly:
Mixed authorship. Text that's partly AI-drafted and partly human-revised falls in genuinely ambiguous territory. The signal is mixed because the actual writing is mixed. Detectors weren't designed for this case, and it's now probably the most common scenario.
Genre-specific writing. Technical documentation, legal briefs, medical reports — these genres naturally pattern like AI regardless of who wrote them. A detector trained on general prose has no useful context here.
Short text. With fewer than 150 words, there's not enough statistical signal to make reliable inferences. Most detectors become dramatically less accurate on short passages. Some are essentially guessing at that length.
Non-English languages. Most detectors are trained predominantly on English text. Their accuracy in Spanish, French, or Hindi is significantly lower — and significantly less studied. A number of published evaluations have found near-random accuracy for non-English content on tools that claim multilingual support.
The Practical Takeaway
If you're a student worried about false positives, or a writer who uses AI as a drafting tool, the fingerprint concept is useful in a very concrete way: now you know what to change.
Raising perplexity means unexpected word choices — lean into them. Add specific, unusual details. Choose the precise word over the safe word. Raising burstiness means varying sentence length aggressively — write some very long sentences, some two-word sentences, mix them up deliberately. Breaking structural predictability means going off-script: put the interesting detail at the end of a paragraph, start a section with a question, skip the summary conclusion and just stop.
None of this is about "fooling" a detector. It's about writing that actually sounds like a human wrote it — which is the goal anyway.
The fingerprint AI leaves behind isn't a quirk or a glitch. It's the direct result of how language models work. Once you understand that, you know exactly what to do about it.
Related reading: