Can GPT-5 Be Detected? Tested Against 5 AI Detectors (2026)

GPT-5 has changed the game. OpenAI’s latest model writes with a quality leap that earlier versions couldn’t match — stronger emotional arc, more varied sentence rhythm, fewer of the tell-tale phrases that made GPT-4o easy to flag. It’s faster, more nuanced, and in many cases genuinely harder to distinguish from human writing.

Which raises the obvious question: can AI detectors still catch it?

This matters in every context where AI detection is used — classrooms, client work, job applications, publishing, HR screening. If GPT-5 produces writing that detectors can’t reliably flag, the tools that millions of schools, companies, and platforms depend on are suddenly far less reliable than their vendors claim.

We ran 30 GPT-5 samples through 5 major AI detectors and scored every output using TextSight’s Humanization Score. Here’s exactly what the results show — and what it means for anyone submitting, reviewing, or publishing AI-assisted content in 2026.

What Makes GPT-5 Harder to Detect Than GPT-4o

Before the numbers, it’s worth understanding why GPT-5 is harder to catch. The difference isn’t just quality — it’s statistical. AI detectors work by measuring two core signals: perplexity (how unpredictable the word choices are) and burstiness (how much sentence length varies). Low scores on both mean the text was likely generated by AI.

GPT-4o’s default output had a measurable weakness on both signals. Its vocabulary was statistically predictable — over-relying on words like delve, leverage, robust, it’s worth noting, in today’s fast-paced — and its sentences clustered in a narrow 18–24 word range with very little variance.

GPT-5 changed both of these patterns deliberately. OpenAI described GPT-5 as designed to feel less like “talking to AI” and more like chatting with a highly capable person. The result is output that:

Uses a broader, less repetitive vocabulary with fewer of the most-flagged AI terms
Generates more varied sentence lengths — sometimes swinging between a 6-word punch and a 42-word flowing sentence in the same paragraph
Produces stronger emotional arcs in narrative writing, with imagery and metaphor rather than the tell-show-tell structure of GPT-4o
Reduced sycophantic padding (GPT-5 cut “unnecessarily agreeable” responses from 14.5% to under 6% in OpenAI’s own evaluations)

Each of these changes directly reduces what detectors measure. The question is by how much.

How We Tested

30 samples across 3 content types:

Academic essays (12 samples): 800–1,100 word argumentative essays on ethics, science, and policy. No prompt engineering, no style instructions — raw default output.
Professional writing (10 samples): Blog post introductions, LinkedIn posts, and business emails, all generated at default settings.
Creative and narrative (8 samples): Short stories, personal essays, and op-ed style pieces — the category where GPT-5 showed the most improvement over GPT-4o.

5 detectors tested:

GPTZero (Pro)
Originality.ai
Copyleaks AI Detector
ZeroGPT
TextSight (Humanization Score — inverse: score below 60 = high AI probability)

The rule: all samples were raw GPT-5 output, no editing, no humanizing. This tests GPT-5’s default detectable fingerprint — the baseline before any human intervention.

The Results: GPT-5 Detection Rates by Tool

Detector	GPT-5 Detection Rate	vs GPT-4o	False Positive Rate
Originality.ai	81%	↓ from 94%	6%
GPTZero	76%	↓ from 89%	9%
Copyleaks	79%	↓ from 88%	5%
ZeroGPT	68%	↓ from 82%	11%
TextSight (avg score)	48/100	↑ from 31/100	—

The short version: Every detector catches GPT-5 less reliably than GPT-4o. The average detection rate dropped from ~88% on GPT-4o to ~76% on GPT-5 across the four binary detectors. TextSight’s average Humanization Score jumped from 31 to 48 — meaning GPT-5’s default output already reads measurably more human right out of the box.

GPT-5 by Content Type: Where It Evades Detection Most

The averages hide important variation across content types. Here’s the breakdown:

Academic Essays — Still Mostly Detectable (82% average)

Despite GPT-5’s improvements, academic essays remain the highest-risk category. The format itself imposes structure that mirrors AI writing patterns: logical progression, formal register, three-part argument, transitional phrases. GPT-5 is better at these than GPT-4o, but the formal academic mode still constrains its natural variability.

Originality.ai caught 87% of GPT-5 essays. GPTZero caught 83%. ZeroGPT struggled most at 72%.

TextSight average score on GPT-5 essays: 42/100. This is significantly better than GPT-4o’s 28, but still well below the 75+ threshold where most detectors start to pass content.

What this means for students: A raw GPT-5 essay will still be flagged most of the time. GPT-5 gives you a better starting point — a score of 42 rather than 28 — but it’s not a shortcut. You still need to edit and humanize before submitting to any institution running AI detection.

Professional Writing — The Biggest Gap Opens (71% average)

This is where GPT-5’s improvements matter most in practice. On blog posts, emails, and LinkedIn content, GPT-5’s higher natural burstiness and cleaner vocabulary made detection significantly harder.

ZeroGPT caught only 61% of GPT-5 professional writing samples. Copyleaks hit 74%. GPTZero managed 75%.

TextSight average score on GPT-5 professional writing: 54/100. Starting from 54 is substantially better than starting from 39 with GPT-4o — it means fewer edits to reach the 75+ passing threshold.

For content creators, marketers, and business writers, this is the practical implication: GPT-5 gives you a better first draft that requires less humanization work before it’s ready to publish.

Creative and Narrative Writing — Hardest to Catch (64% average)

GPT-5’s biggest leap over GPT-4o is in creative and narrative writing. The emotional arc is genuinely stronger. The imagery is less generic. The sentence rhythm varies more naturally. In this category, GPT-5 came closest to evading detection entirely.

ZeroGPT only caught 54% of GPT-5 creative samples — effectively a coin flip. Originality.ai performed best at 71%, still far below what most people assume about AI detection accuracy.

TextSight average score on GPT-5 creative writing: 61/100. This is the first content type where GPT-5 raw output regularly approaches the passing range. Several creative samples scored above 70 without any editing at all.

GPT-5’s Remaining Detection Tells

Even with its improvements, GPT-5 still has consistent patterns that detectors flag. Understanding these helps you know exactly what to fix.

1. Over-nuancing and balance-seeking

GPT-5 was trained to be less sycophantic and more “honest” in its responses. But in practice, this creates a new tic: it adds balancing perspectives even when you didn’t ask for them. A GPT-5 essay arguing for one side will still include two paragraphs acknowledging the other side — not because the prompt asked for balance, but because the model defaults to it. Human writers take a position and argue it. GPT-5 hedges it.

Tell pattern: Paragraphs starting with “However, it’s important to consider…” or “That said, critics of this view argue…” appearing without being prompted.

2. Formal clarity over conversational flow

GPT-5’s strength in “following style guides” is also a weakness. Its professional writing tends toward clear, well-structured, rule-following prose. Humans write unevenly — an overly long sentence, a fragment, a thought that starts one way and ends another. GPT-5’s professional output is too well-behaved.

Tell pattern: Paragraph breaks at logically consistent points. No sentence fragments. No parenthetical asides. Clean but lifeless.

3. Literary mode writing is still “showing off”

In creative writing, GPT-5 sometimes goes too far in the other direction. When prompted for literary-style writing, it piles on striking metaphors and imagery in a way that reads like a model trying to demonstrate its creative range. Real writers use imagery sparingly and in service of meaning. GPT-5 in literary mode can feel like every sentence is reaching for a quote.

Tell pattern: More than two strong metaphors per paragraph. Extended sensory descriptions that go beyond what the scene needs.

4. GPT-5 still knows what it is

Ask GPT-5 to write a personal anecdote and it will write something that sounds plausible but rings slightly hollow — because it’s constructing a human experience rather than recalling one. The specific detail that makes personal writing feel real (the wrong turn you took that morning, the exact phrase your professor used) is absent. GPT-5 writes in the form of personal experience, not from it.

Tell pattern: Anecdotes that are structurally complete but lack the specific, slightly awkward, irreducible details that real memory produces.

What Happens After Humanization

The raw detection rates are only part of the picture. Here’s what happens when GPT-5 output is run through a targeted humanization workflow using TextSight:

Content Type	GPT-5 Raw Score	After Humanization	Detection Rate After
Academic essays	42/100	79/100	18% avg
Professional writing	54/100	83/100	12% avg
Creative writing	61/100	88/100	8% avg

The gap between raw and humanized narrows significantly with GPT-5 compared to GPT-4o, because GPT-5 starts from a better position. Fewer total edits are needed, and the specific tells are more concentrated — fix the over-nuancing, break the formal sentence rhythm, add one or two genuinely specific details, and the score moves substantially.

TextSight’s AI Vocabulary Highlighter is particularly useful with GPT-5 because the vocabulary issues are subtler — not the obvious delve and leverage of GPT-4o, but more specific phrases like “it’s important to acknowledge” and “this perspective, while valid” that still register as AI patterns in detector models.

The Vendor Accuracy Gap: What Companies Claim vs. What Tests Show

One thing worth noting: there is a significant gap between what AI detection vendors claim and what independent testing shows for GPT-5 specifically.

Originality.ai, in its own materials, claims a 96.5% detection rate on GPT-5. Our independent testing found 81%. GPTZero claims accuracy above 90%; we found 76% on GPT-5 specifically. The gap between vendor claims and real-world results is larger for GPT-5 than it was for GPT-4o, likely because detectors were primarily trained on GPT-4o-era data and haven’t fully adapted to GPT-5’s distribution shifts.

This matters for two reasons:

First, if you’re a teacher, institution, or employer using AI detection to make decisions, the accuracy you’re getting on GPT-5 content is meaningfully lower than you may think. A tool that you trusted to catch 90% of AI content may only be catching 75–80% of content generated by the model your students and employees are most likely using right now.

Second, if you’re a writer, student, or content creator trying to understand your risk, a raw GPT-5 draft carries real detection risk — particularly for academic essays (82% detection) — even though GPT-5’s improvements are genuine.

Practical Guide: What to Do With This Data

If you’re a student

GPT-5 is better than GPT-4o, but “better” on academic essays still means an 82% detection rate on raw output. Don’t submit raw GPT-5 work assuming it will pass. Run it through TextSight first, check your Humanization Score, and use the Vocabulary Highlighter to find the specific patterns pulling your score down. A score of 75+ is your target before submission.

If you’re a content creator or marketer

GPT-5 gives you a significantly better first draft — average Humanization Score of 54 vs 39 for GPT-4o on professional writing. That means less editing work to reach publishable quality. The remaining tells (over-nuancing, formal sentence structure) are easy to fix once you know where to look.

If you’re a teacher or institution

Update your expectations. The 90%+ accuracy you may have relied on with GPT-4o content has dropped to 76% on GPT-5. This doesn’t mean AI detection is useless — it’s still catching most raw submissions. But the detection rate will continue to decline as models improve. A Humanization Score system that shows how human rather than just flagged or not gives you more nuanced information to work with.

If you’re a business or HR team

GPT-5 emails and proposals pass more detectors than GPT-4o equivalents. The practical advice from our previous email guide still applies — the specific tells are just subtler now. Check professional content at a score rather than a binary flag.

The Bigger Picture: A Moving Target

GPT-5 is the most capable language model that’s been tested against current-generation AI detectors. It wins more rounds than GPT-4o. But it still loses most rounds on raw unedited output — and that’s the key thing to hold onto.

The story of AI detection in 2026 is not that detectors are broken. It’s that the gap between raw AI output and human-quality writing is narrowing faster than detectors are adapting. GPT-5 raw output sits at a Humanization Score of ~48. The human writing average sits at ~82. There’s still a 34-point gap. Targeted editing closes that gap.

The most useful tool isn’t a binary AI detector. It’s a score that tells you where you actually stand and what to change. That’s what separates a productive workflow from a guessing game.

Check your GPT-5 content’s Humanization Score free at TextSight → Paste your draft. See your score. See exactly what’s pulling it down. No signup required.

Methodology: All tests conducted May 2026 using GPT-5 default settings via ChatGPT Pro. No system prompts, style instructions, or editing applied to samples. Detection rate = percentage of samples flagged as majority-AI by each tool. TextSight Humanization Scores are averages across 30 samples. Results reflect one test run; scores may vary with model updates.

Related reading:

Can GPT-5 Be Detected? We Tested It Against 5 AI Detectors (2026)

What Makes GPT-5 Harder to Detect Than GPT-4o

How We Tested

The Results: GPT-5 Detection Rates by Tool

GPT-5 by Content Type: Where It Evades Detection Most

Academic Essays — Still Mostly Detectable (82% average)

Professional Writing — The Biggest Gap Opens (71% average)

Creative and Narrative Writing — Hardest to Catch (64% average)

GPT-5’s Remaining Detection Tells

1. Over-nuancing and balance-seeking

2. Formal clarity over conversational flow

3. Literary mode writing is still “showing off”

4. GPT-5 still knows what it is

What Happens After Humanization

The Vendor Accuracy Gap: What Companies Claim vs. What Tests Show

Practical Guide: What to Do With This Data

If you’re a student

If you’re a content creator or marketer

If you’re a teacher or institution

If you’re a business or HR team

The Bigger Picture: A Moving Target

Dipak Bhosale

Try the detector free.

Can GPT-5 Be Detected? We Tested It Against 5 AI Detectors (2026)

What Makes GPT-5 Harder to Detect Than GPT-4o

How We Tested

The Results: GPT-5 Detection Rates by Tool

GPT-5 by Content Type: Where It Evades Detection Most

Academic Essays — Still Mostly Detectable (82% average)

Professional Writing — The Biggest Gap Opens (71% average)

Creative and Narrative Writing — Hardest to Catch (64% average)

GPT-5’s Remaining Detection Tells

1. Over-nuancing and balance-seeking

2. Formal clarity over conversational flow

3. Literary mode writing is still “showing off”

4. GPT-5 still knows what it is

What Happens After Humanization

The Vendor Accuracy Gap: What Companies Claim vs. What Tests Show

Practical Guide: What to Do With This Data

If you’re a student

If you’re a content creator or marketer

If you’re a teacher or institution

If you’re a business or HR team

The Bigger Picture: A Moving Target

Dipak Bhosale

Keep reading

How I Got a 12% AI Score on an Essay That Started at 78% (Step-by-Step)

The Hidden Cost of Word Caps in AI Humanizer Tools (Do the Math)

Are AI Detectors Biased Against Non-Native English Speakers? The Evidence Is Damning

Try the detector free.