What Turnitin's AI Detection Actually Does
If you are a student, you have probably wondered whether Turnitin can actually catch ChatGPT-written work. If you are a teacher, you have probably wondered whether you can trust Turnitin's AI indicator when it flags a student's submission.
Both are fair questions. And both deserve a straight answer.
We ran 12 different writing samples through Turnitin's AI detection system — ranging from raw ChatGPT output to heavily edited human-AI hybrid work — and documented exactly what happened. Here is what we found.
How Turnitin's AI Detection Works
Turnitin launched its AI writing detection feature in April 2023. It was built in response to the explosion of ChatGPT usage in academic settings and is now available to institutions that subscribe to their AI detection add-on.
The system works by analysing the statistical patterns in submitted text. Like most AI detectors, it looks at two core signals:
Perplexity — how predictable each word choice is given the surrounding context. AI models tend to choose statistically likely words. Human writers are more unpredictable.
Burstiness — how much the sentence length and complexity varies throughout the text. Humans write in irregular rhythms. AI tends to produce more consistent, even output.
Turnitin claims its model has a less than 1% false positive rate at the 20% AI-written threshold. That means if Turnitin reports that 20% or more of a document was AI-generated, it is claiming a very high degree of confidence.
What that number does not tell you is how it handles the large grey zone between 0% and 20% — and that grey zone is where most real-world academic submissions actually live.
Our Test Setup
We created 12 writing samples across four categories:
Category A — Pure ChatGPT output (3 samples)
We asked ChatGPT-4o to write a 500-word essay on three different topics: climate policy, the causes of World War One, and the ethics of social media. No editing. Submitted exactly as generated.
Category B — Lightly edited ChatGPT (3 samples)
Same essays as Category A, but we made minor changes: fixed a few sentences, swapped some vocabulary, added a personal anecdote to one.
Category C — Heavily humanised ChatGPT (3 samples)
Same source essays, but put through a full humanisation pass — restructured paragraphs, changed sentence rhythm, added specific examples and original opinions throughout.
Category D — Fully human writing (3 samples)
Three essays written entirely by a human writer with no AI involvement. Used as our control group.
All samples were submitted to Turnitin through an institutional test account.
The Results
Category A: Pure ChatGPT — Detected Almost Every Time
All three pure ChatGPT samples were flagged with AI scores between 78% and 94%. Turnitin had no difficulty identifying clean, unedited AI output.
This should not surprise anyone. Raw ChatGPT output has very distinctive patterns — consistent sentence length, predictable vocabulary, smooth transitions that never quite feel like a real person writing. Turnitin's model has been trained extensively on this kind of content.
What this means: If a student submits unedited ChatGPT work, they will almost certainly be flagged.
Category B: Lightly Edited ChatGPT — Mixed Results
This is where it got interesting. The three lightly edited samples produced scores of 61%, 44%, and 29%.
The 61% result was the sample where we made the fewest changes — just vocabulary swaps and one added sentence. The 29% result was the one where we added a personal anecdote that broke the AI's structural pattern.
What this tells us is that Turnitin's detection weakens significantly as soon as human-sounding irregularities are introduced. Adding even a single genuine first-person experience — something AI cannot fabricate convincingly — dropped the score by over 30 percentage points.
What this means: Light editing is enough to create uncertainty in Turnitin's output. A score between 20% and 60% puts both the student and the teacher in genuinely ambiguous territory.
Category C: Heavily Humanised ChatGPT — Mostly Passed
Two of the three heavily humanised samples scored below Turnitin's 20% threshold — at 14% and 8%. The third scored 31%, likely because the underlying essay structure remained recognisably AI-generated even after the vocabulary and style were overhauled.
This result highlights the core limitation of any AI detector that relies purely on surface-level text patterns. Once the statistical fingerprint of AI output has been sufficiently disrupted — through restructuring, rewording, and genuine human addition — the detector loses its signal.
What this means: Turnitin cannot reliably detect AI content that has been properly humanised. This is a significant gap for academic integrity enforcement.
Category D: Fully Human Writing — One False Positive
Three human-written samples. Two scored 0%. One scored 17%.
That 17% result came from an essay written in a formal academic style with consistent paragraph structure and relatively formal vocabulary. In other words — the human writer happened to write in a way that resembled AI output.
This is the false positive problem that Turnitin warns about in their own documentation. Students who naturally write in formal, structured styles — particularly those who are not native English speakers — are at higher risk of being incorrectly flagged.
What this means: A Turnitin AI score is not proof of AI authorship. It is a signal that requires human judgement.
What Turnitin Is Actually Good At
Based on our testing, Turnitin's AI detection is genuinely effective in one specific scenario: catching students who submit raw or minimally edited ChatGPT output.
For that use case — which is probably the majority of lazy AI misuse — it works well. A student who generates an essay and submits it directly is going to get caught.
It is also reasonably good at flagging when large chunks of a document are AI-generated, even if some sections are human-written. The sentence-level breakdown Turnitin provides lets instructors see exactly which parts of a submission raised flags, which is genuinely useful for targeted conversations.
Where Turnitin Falls Short
It cannot detect well-edited AI content. Our Category C results make this clear. Once a student invests meaningful effort into rewriting and personalising AI-generated content, Turnitin's detection rate drops below its own stated confidence threshold.
It produces false positives for formal writers. Students who write in structured, formal academic styles — including many international students — may be flagged incorrectly. This has real consequences for students who are wrongly suspected.
It does not assess quality or learning. A submission could score 0% on AI detection and still be a poorly researched, derivative piece of work. And a submission could score 40% and still represent genuine intellectual engagement from a student who used AI as a research starting point.
It creates an arms race. Every time Turnitin improves its detection, AI humanisation tools improve in response. This is a game that detection alone cannot win.
What This Means for Students
The practical reality is this: Turnitin will catch you if you submit unedited AI content. If that is what you were planning, stop.
But the more important conversation is about what AI use actually means for your education. Using ChatGPT to generate an essay and submitting it unchanged is not just a risk — it is a waste. You learn nothing, you develop no skill, and you leave yourself exposed.
The smarter approach — and the approach that will still be valuable in five years — is to use AI as a thinking tool, not a writing tool. Use it to understand a topic faster, generate an outline, identify arguments you had not considered. Then write the actual work yourself, informed by that research. That process is both undetectable and genuinely educational.
What This Means for Teachers and Institutions
A Turnitin AI score above 20% is a reason to have a conversation, not a reason to issue a penalty.
The false positive rate — though low in aggregate — is real at the individual level. A student who writes formally and scores 17% or 22% deserves the chance to explain their process before any action is taken. Many institutions are now requiring students to submit process documentation alongside their final work for exactly this reason.
More fundamentally, detection-first policies miss the point. The question is not whether a student used AI — it is whether they learned something. Assessment design that makes AI less useful (oral exams, in-class writing, project-based work) addresses the underlying challenge in a way that detection alone never can.
How TextSight Compares to Turnitin for AI Detection
Turnitin is designed specifically for academic submission review — it integrates into LMS platforms and processes documents at scale for institutions.
TextSight is designed for real-time content analysis, sentence-level transparency, and use cases beyond academic submissions — including content marketing, journalism, and professional writing.
The key difference: TextSight shows you exactly which sentences are driving a high AI score, giving writers the ability to understand and revise specific lines rather than just receiving a percentage score. For anyone who wants to understand their writing rather than just pass a detection check, that sentence-level visibility changes how you approach the revision process.
You can run any text through TextSight free — no account required — and get a result in under two seconds.
The Bottom Line
Turnitin detects ChatGPT reliably when the content is raw and unedited. As soon as meaningful human intervention occurs, its detection rate drops — sometimes significantly.
This does not mean AI detection is useless. It means AI detection is one signal among several that educators should use, alongside assessment design, student conversation, and process documentation.
For students: the safest approach is also the most educational one. Use AI to think, not to write.
For teachers: treat AI scores as the start of a conversation, not the end of one.
And for anyone who wants to check their own content before it reaches a Turnitin review — TextSight gives you the same kind of sentence-level analysis in seconds, for free.