HomeAI Detector › For GPT-5 Output

AI detector re-fit to catch GPT-5 output, even as the tells get subtler.

GPT-5 is OpenAI's newest flagship, and it was tuned to read more human. The obvious GPT-4-era giveaways are mostly gone, the cadence varies more, and single-word lexical tells are weaker. The signals that hold up are statistical and structural: consistent rhetorical scaffolding, even-handed both-sides hedging, semantic evenness, and perplexity and burstiness distributions that still differ from human writing. TextSight is re-fit on fresh GPT-5 samples, flags the lines that react with colour-coded highlights, and runs the same scan against Gemini and other models at no extra step. Free to try. No card.

Start free, no card See pricing
Pro at $14.99/mo yearly Multi-model classifier No training on your text
Built for GPT-5

Re-fit for the latest OpenAI flagship in a single multi-model scan.

GPT-5 is the default model behind a large share of new ChatGPT output, and it was deliberately tuned to look less machine-written. Most detectors were calibrated on GPT-4-era vocabulary and now underrate GPT-5. TextSight was re-fit on fresh GPT-5 samples and weights the statistical and structural patterns that survive the cleanup, alongside Gemini and other model signals. No detector is perfect on a frontier model, so we frame the score as evidence to weigh, not an automated verdict.

GPT-5 is harder to read than its predecessors precisely because OpenAI removed the easy tells. The much-discussed filler words appear far less often, the rhythm varies more, and a first read can feel genuinely human. That means a detector cannot lean on a banned vocabulary list and expect to catch it. TextSight instead reads how the text is built: its rhetorical shape, the evenness of its emphasis, the way it hedges, and the perplexity and burstiness profile underneath.

One scan, every major model

You do not need to tell TextSight which model produced the text. The classifier reads the prose and flags GPT-5-shaped sentences, Gemini-shaped sentences, and other model-shaped sentences in the same pass. Mixed-source documents (one paragraph drafted in GPT-5, another reworded in Gemini) score correctly because each sentence is scored on its own pattern.

Sentence-level highlights tuned to what GPT-5 still leaks

Colour-coded sentence highlights point to the specific lines that carry GPT-5 markers: the tidy setup-and-resolution scaffolding, the both-sides hedging, and the flat semantic density where every sentence carries a similar load. Reviewers see exactly which sentences drove the score rather than guessing from a single percentage.

API or web surface, same signal

Output coming through chatgpt.com, the ChatGPT desktop and mobile apps, or any downstream tool wrapped around the OpenAI API all carry the same fingerprints. The classifier treats GPT-5 as a model, not as a product surface, so detection works regardless of where the user pasted from.

GPT-5 voice patterns

What still gives GPT-5 away once the easy tells are gone.

The honest starting point: GPT-5 is a deliberately cleaner writer than GPT-4. It dropped most of the giveaway vocabulary, varies its sentence length more, and reads convincingly human on a first pass. So the reliable tells are no longer single words. They are statistical and structural, and they fall into five families that hold up even after light editing. None of them is a verdict on its own; the classifier weighs them together.

Consistent rhetorical scaffolding

GPT-5 reaches for the same shape across answers: a clean orienting setup, a balanced body that walks through the considerations in order, and a tidy summarising resolution. A human writer commits earlier, buries the lede, or trails off; GPT-5 almost always closes the loop. When the same setup-body-resolution arc recurs across paragraphs that should have had different shapes, the structure itself becomes the signal, no buzzword required.

Even-handed both-sides hedging

Ask GPT-5 a question with a clear answer and it still tends to present both sides evenly, qualifying its own claims and granting the counterpoint before moving on. The hedging is symmetrical and almost diplomatic, where a person with a view leans one way and shows it. This balanced neutrality is one of the more durable GPT-5 fingerprints because it comes from how the model was aligned, not from a phrase it can be told to drop.

Semantic evenness and flat information density

Human writing is lumpy. Some sentences carry a lot, others coast. GPT-5 tends to spread information evenly, so each sentence delivers a similar load and the emphasis never spikes. That flatness reads smooth but is statistically distinctive: the per-sentence information curve stays unusually level across a passage. TextSight measures this evenness directly rather than inferring it from vocabulary.

Perplexity and burstiness that still sit in a band

GPT-5 narrowed the gap with human writing on both measures, which is exactly why older detectors miss it. But the distributions have not closed entirely. Word choices remain a little more predictable than a person's, and sentence-length variance, while wider than GPT-4's, still clusters in a measurable range. The classifier reads the shape of these distributions across the whole passage, not any one sentence in isolation.

Faded lexical tells, not absent ones

The famous markers are weaker now. Words like delve appear far less often, and counting them no longer works as a detection strategy. They have not vanished entirely, so they remain a minor input, but TextSight deliberately puts little weight on any fixed vocabulary list. Leaning on word counts is what causes buzzword-based detectors to return falsely low scores on GPT-5; the heavier weight belongs on structure, hedging balance, and the statistical profile above.

Plans & pricing

Pricing for solo reviewers and detection teams.

Pro at $19.99 a month standard, $14.99 a month on yearly, is the right fit for solo editors, instructors, and reviewers running steady individual scans. Business at $39.99 a month standard, $29.99 a month on yearly, fits teams scanning fifty or more pieces a month with shared history and REST API access. Full details on the pricing page.

Free
$0/forever

 

Try a GPT-5 scan. No card, no email.
  • 3 scans / day
  • 5,000 chars per scan
  • Sentence-level highlights
  • 2 lifetime AI rewriter uses
Start free
Starter
$7.49/month

Billed $89.88/year — Save $30

Light reviewers running a few scans a week.
  • 20 scans / day
  • 20,000 AI rewriter words/mo
  • Chrome extension
  • Email support
Get Starter
Business
$29.99/month

Billed $359.88/year — Save $120

Detection teams. Fifty or more pieces a month.
  • 100,000 AI rewriter words/mo
  • 5 team seats, shared history
  • Audit log, REST API
  • White-label PDFs
Get Business

Yearly billing saves 25%. View full pricing →

Calibration

Why other detectors underrate GPT-5 content.

Detector disagreement on GPT-5 is common, and it usually runs in one direction: older tools score it too low. Many AI detectors were calibrated on GPT-3.5 and GPT-4 output, where a handful of vocabulary markers and a uniform cadence did most of the work. GPT-5 removed exactly those crutches, so a detector tuned to the previous generation reads it as suspiciously human.

Calibrated to the wrong generation

Detectors trained mostly on GPT-4-era output learned the stock transitions, the predictable filler, and the flatter cadence of that generation. GPT-5 writes cleaner, varies its rhythm, and drops the giveaway vocabulary, so it does not light up those features. The detector reads it as low confidence and returns a human-ish score even when the prose is straightforwardly GPT-5.

Buzzword lists are the wrong tool now

A detector built on counting flagged words was always brittle, and GPT-5 makes the weakness obvious. With the obvious markers thinned out, a vocabulary-based classifier has little to fire on. TextSight puts the weight on structure, hedging balance, semantic evenness, and the perplexity and burstiness distribution instead, because those persist after the easy words are gone.

How to read a disagreement

When TextSight reports a high AI score on a paragraph and an older detector reports a low one, the disagreement is usually a calibration gap, not a contradiction. The two tools are reading different distributions, and the older one is looking for tells GPT-5 no longer leaves. Sentence-level highlights make this concrete: a reviewer can point to the specific lines carrying the structural and statistical markers and decide whether to act. No detector is perfect on a frontier model, so the highlights matter more than the headline number.

Re-fit cadence keeps detection current

OpenAI updates GPT-5 and the stylistic distribution drifts. TextSight refits the classifier against fresh GPT-5 samples on a rolling cadence rather than freezing a snapshot. The page you are reading reflects the current distribution, and because the approach reads statistical fingerprints rather than a fixed word list, it degrades gracefully as the model evolves instead of going stale overnight.

Where GPT-5 shows up

Essays, marketing copy, and support replies at scale.

GPT-5 turns up wherever ChatGPT does, which is nearly everywhere, now on the latest model. Because it is the default flagship, a large share of new ChatGPT-produced text is GPT-5 output. The highest-volume contexts are student essays, marketing and SEO copy, and customer-support replies, with code documentation close behind. Each context calls for a slightly different read of the scan.

Academic context

Students reach for GPT-5 because it produces a clean, balanced essay that no longer trips the obvious word-level filters. Instructors reviewing submissions see the giveaway differently now: the recurring setup-and-resolution shape across paragraphs, the both-sides hedging on questions that wanted a stance, and the unnaturally even information density. Sentence highlights make the pattern explicit, which is more useful in an integrity conversation than a single percentage.

Marketing and SEO content

Content teams lean on GPT-5 for blog drafts, landing pages, and email sequences because the prose reads polished and the old AI tells are gone. The polish is now the tell. The copy is smooth and competent but oddly shapeless, every paragraph hitting the same rhythm and granting every counterpoint. Reviewers running a pre-delivery scan catch the lift-and-publish case before it ships.

Customer support and product replies

Support teams route a growing share of replies through GPT-5 behind the OpenAI API. The diplomatic, evenly hedged tone is exactly what the model produces by default. Detection here is less about misconduct and more about quality control: flagging canned, un-reviewed replies before they go to a customer, and spotting reviews or product descriptions that were generated wholesale.

Code documentation and internal notes

Engineering teams use GPT-5 to draft README files, API references, and longer technical notes. The structure fits, but the prose around the code carries the same flat evenness and tidy resolution. A quick scan catches documentation that has not been read by a human before publication, a separate quality concern from academic integrity but one the same signals surface.

What you see in a GPT-5 scan

Sentence highlights, paragraph cards, perplexity, and burstiness.

A single percentage is not an evidence trail, and on a frontier model the evidence matters more than the number. The TextSight result panel surfaces which sentences carried GPT-5 markers and why, with paragraph-level rollups for longer pieces, so reviewers can point to specific lines and pair the score with their own judgment.

Sentence-level highlights

Every sentence is colour-coded by its own AI-likeness score. Red sentences clustered through the tidy setup-and-resolution scaffolding and the even both-sides hedging are a stronger signal than scattered yellows. The visual makes the structural pattern legible without forcing a reviewer to study the percentage, which is exactly what you want when the word-level tells are no longer there to point at.

Paragraph cards on Pro

Longer pieces get paragraph-level rollups so reviewers can see which paragraph is dragging the headline score. On GPT-5 content this usually points at the body sections, where the flat information density and symmetrical hedging are most concentrated. Targeting the lowest-scoring paragraph first is the fastest way to confirm the read.

Perplexity, read-only on Pro

Perplexity measures how predictable word choices are to a language model. GPT-5 narrowed this gap deliberately, so the numbers run closer to human writing than GPT-4 did, which is the whole reason older detectors miss it. The diagnostic context still helps: an unusually smooth, low-variance perplexity curve across a long passage is informative even when no single sentence looks damning.

Burstiness, read-only on Pro

Burstiness measures sentence-length variance. GPT-5 varies its sentence length more than earlier models, so high burstiness alone no longer clears a passage. What reads as a signal is the combination: moderate burstiness paired with the structural scaffolding and even semantic density. The diagnostics are context for the verdict, not the verdict, and on a frontier model that distinction is the point.

FAQ

GPT-5 detection frequently asked.

Is TextSight re-fit to detect GPT-5 output specifically?
Yes. TextSight is a multi-model classifier re-fit on fresh GPT-5 samples alongside Gemini and earlier GPT generations. GPT-5 deliberately dropped many GPT-4-era giveaways, so the classifier leans less on single-word lexical tells and more on statistical and structural fingerprints: consistent rhetorical scaffolding, even-handed both-sides hedging, semantic evenness, and perplexity and burstiness distributions that still differ from human writing. You do not tell the scanner which model produced the text; it reads GPT-5-shaped prose by its own patterns. No detector is perfect on a frontier model, so pair the score with your own judgment.
Why is GPT-5 harder to detect than GPT-4?
OpenAI tuned GPT-5 to read more human. It uses the obvious GPT-4-era filler far less (the much-discussed delve is largely gone), varies its cadence more, and produces rhythm that feels closer to a person on a first read. That removes the easy lexical shortcuts older detectors relied on. The signals that hold up are statistical and structural rather than word-level, which is why a detector that was tuned to GPT-4's vocabulary will underrate GPT-5. TextSight was re-fit on GPT-5 output to weight the patterns that actually persist.
What still gives GPT-5 away if the word-level tells are gone?
Structure and statistics. GPT-5 tends to reach for the same rhetorical scaffolding across answers: a clean setup, a balanced middle, and a tidy resolution. It hedges both sides of a question evenly even when a human writer would commit. Its semantic density stays unusually flat, with each sentence carrying a similar information load rather than the uneven emphasis people produce. And its perplexity and burstiness distributions, while closer to human than GPT-4's, still sit in a measurable band. TextSight reads those distributions rather than hunting for a banned word list.
Does the loss of 'delve' and other buzzwords break detection?
It breaks word-list detection, not statistical detection. Counting flagged vocabulary was always a brittle approach, and GPT-5 makes that clear by writing cleaner copy with fewer obvious markers. A detector built on a buzzword list will return falsely low scores on GPT-5. TextSight never relied on a fixed vocabulary list as the verdict; lexical signals are one weak input among many, and the heavier weight sits on structure, hedging balance, semantic evenness, and the perplexity and burstiness profile that GPT-5 still carries.
Does TextSight detect GPT-5 alongside Gemini and other models in one scan?
Yes. The classifier is multi-model by design. A single scan flags OpenAI GPT-5, Google Gemini, and other large language models without you pre-selecting a target. This matters for mixed-source content where one section was drafted in GPT-5, another reworded in Gemini, and a third paragraph written by hand. Sentence-level highlights show which lines reacted regardless of the source model, so you can act on specific evidence rather than a single headline number.
Where does GPT-5 output usually show up?
Wherever ChatGPT does, now on the latest model. GPT-5 reaches users through chatgpt.com, the ChatGPT desktop and mobile apps, and the OpenAI API behind countless writing tools, code editors, and customer-support widgets. Because it is the default flagship, a large share of new ChatGPT-produced text is GPT-5 output. It flows into student essays, marketing copy, support replies, product descriptions, and code documentation. TextSight reads the prose regardless of which surface produced it.
How accurate is TextSight on GPT-5 output?
No detector is perfect on a frontier model, and GPT-5 is harder than its predecessors because OpenAI tuned out the easy tells. TextSight is re-fit on fresh GPT-5 samples and reads statistical and structural fingerprints rather than a vocabulary list, which keeps it more durable than buzzword-based tools. We work to keep false positives on native human English writing low, though no detector eliminates them entirely. We do not publish a fixed GPT-5 accuracy number because frontier distributions drift; treat the score as strong evidence to weigh alongside your own judgment, not an automated verdict.
Which TextSight tier fits GPT-5 detection workloads?
Pro at $19.99 a month standard, or $14.99 a month on yearly, is the right fit for solo reviewers, editors, and instructors running individual scans across a steady inbound flow. It unlocks unlimited scans, a 10,000 character cap per scan, 90-day scan history, file upload, and the integrated AI rewriter. Business at $39.99 a month standard, or $29.99 a month on yearly, fits teams scanning fifty or more pieces a month with five seats, REST API access, an audit log, and white-label PDFs.
Related

More LLM-specific detection guides.

Scan GPT-5 content with a classifier re-fit to read it.

Free to try. No card. Pro at $14.99 a month on yearly for solo reviewers; Business at $29.99 a month on yearly for detection teams.

Start free, no card See pricing
Multi-model classifier · Re-fit on fresh GPT-5 samples · Sentence-level highlights · No training on your text