Written for journal editors, peer reviewers, and dissertation committees who need a defensible call rather than a single percentage. A research paper is not a blog post with footnotes. Literature reviews are citation-heavy and read formal. Methods are templated by genre convention. Discussions reward fluent abstract prose, which is exactly what frontier models produce by default. Scanning a whole manuscript as one block and reading one number averages those very different baselines into noise. The five-step section-by-section workflow below pairs TextSight's sentence-level signal with iThenticate cross-verification and an ESL-aware calibration, so the case you bring to the author is built on per-section evidence rather than a global score.
Each section is written to its own genre convention. A single percentage flattens those very different baselines into a number that hides where the signal actually lives. Reading the paper section by section is what separates triage from evidence.
The shortest section and the most outsourced. Authors under deadline pressure regularly draft an abstract last and reach to a chatbot when the deadline tightens. The Abstract is also where an AI-tells vocabulary cluster shows up densely, because the section is short enough that two or three frontier-favourite words read as a cluster. Treat a high Abstract score paired with clustered red highlights as one of the strongest signals on the page.
Citation-heavy paraphrase reads formal and templated. Parenthetical citations every two or three sentences, author-date constructions, and the convention of summarising other researchers' work all push the statistical signature toward AI-adjacent territory. Expect the Lit Review to flag higher than the rest of the paper even when a human wrote it. Discount the Lit Review score by roughly the gap you observe between it and the Methods section in clearly human-written reference papers; do not drop it to zero.
Templated by design. Uniform third-person, narrow technical vocabulary, repeated sentence structures, no figurative language: those are the conventions of good methods writing. They also overlap with the statistical signature of AI prose. A clean human-written methods section can score higher than the same author's discussion. Read methods scores as ambient noise unless something specific stands out, and put the diagnostic weight elsewhere.
Partially protected. Reporting specific procedures, numbers, instruments, and outcomes constrains the prose in ways that frontier models do not naturally replicate. A high Results score is unusual and worth a second look; a low Results score is the expected baseline rather than evidence of authenticity.
The most variable section and the most diagnostic. Discussions reward fluent generalised abstraction, which is exactly what chatbots produce by default. A discussion section that flags substantially higher than the methods section, especially with clustered red highlights and AI-favoured vocabulary, is the cleanest signal the paper offers. This is where the section-by-section workflow earns its weight.
Roughly fifteen minutes per manuscript once you are practised. The workflow is designed so the cheap steps catch the easy cases, and the expensive step, the conversation with the author, is reserved for papers where the evidence has converged.
Paste the manuscript into TextSight at app.textsight.ai. The calibrated ML classifier returns a sentence-level highlight map and per-section density in seconds, before any plagiarism queue has started. This is the pre-Turnitin and pre-iThenticate signal: a fast generative-AI read that tells you whether the prose was written rather than lifted. Free tier is enough for one paper a day; Pro at $14.99 a month on yearly billing removes the cap and is the right plan for any editor handling more than a manuscript a week.
Split the manuscript into Abstract, Literature Review, Methods, Results, and Discussion, and look at the density of each section rather than a single global percentage. The pattern matters more than the headline number. A paper where the Discussion section is twice as densely flagged as the Methods section is telling you something the one-shot scan would have buried.
Read the red-highlighted sentences in each section. Do they cluster in one paragraph or scatter across the section? Clustered red sentences in the Discussion or Abstract are a stronger signal than the same percentage spread thinly across the paper. The highlight map is the diagnostic layer; the headline percentage is triage. Note the specific sentences you would quote in a follow-up conversation.
Send the manuscript through iThenticate or Turnitin in parallel for plagiarism similarity. The two outputs answer different questions: TextSight tells you whether the prose was generated; iThenticate tells you whether it was lifted from indexed sources. A paper that flags on both warrants a far closer conversation than one that flags on only one. Editors at Nature, Science, The Lancet, and JAMA already run AI screening; pair it with the similarity layer rather than treating either signal as a verdict on its own.
Treat the result as a conversation-starter, not a verdict. Open an exchange about drafting process, AI-assistance disclosure, and the specific section-level pattern you observed. Lead with per-sentence highlights and section densities rather than a global percentage; you get more honest answers and hold up better on appeal. A genuine author can reconstruct their drafting process in five minutes; the absence of that reconstruction is usually more diagnostic than the score itself.
Free includes 3 detector scans a day and a 1,500-word AI rewriter quota. Paid tiers raise the quotas and add the Chrome extension, file upload, and REST API. Yearly billing saves 25%.
Billed $89.88/year — Save $30
Billed $179.88/year — Save $60
Billed $359.88/year — Save $120
Yearly billing saves 25%. View full pricing
Turnitin and iThenticate answer the plagiarism question. TextSight answers the generative-AI question. Running the AI scan first, then the similarity check in parallel, is what gives editors the converging evidence a defensible reviewer note needs.
Plagiarism engines surface text reuse from indexed sources. They were not built to recognise generated prose that never appeared in any indexed corpus, which is precisely what frontier models produce by default. A paper drafted in ChatGPT and never copy-pasted from a public source can pass Turnitin and iThenticate cleanly while being almost entirely AI-generated. The sentence-level generative signal that TextSight returns in seconds catches that case before the similarity queue completes.
The strongest reviewer position combines both outputs. A paper that flags on TextSight only is a generative-AI question and warrants a conversation about drafting process and AI-assistance disclosure. A paper that flags on iThenticate only is a similarity question and warrants the standard plagiarism conversation. A paper that flags on both is the strongest case and warrants the closest follow-up. Reviewers who treat the two outputs as a single converging picture, rather than as competing verdicts, build cases that hold up on appeal.
Nature, Science, The Lancet, JAMA, and several other major journals have published AI-use disclosure policies and routinely run automated screening on submissions. Editors at smaller journals and conference programme committees increasingly do the same. The screening layer is not a verdict; it is triage. Reviewers and editors still receive a flagged paper and have to decide what to do with it. A section-by-section workflow with per-sentence evidence is what separates a defensible reviewer note from a single-percentage gotcha that loses on appeal.
For any journal with international authors, which is almost every journal that publishes, the ESL caveat is the single most important calibration on this page. Get this wrong and the workflow produces unjust outcomes regardless of how clean the score looks.
Multiple peer-reviewed studies published since 2023 have shown that off-the-shelf AI detectors flag English-as-a-second-language writing as AI-written at roughly three to five times the rate of native-English writing on the same task. The reason is structural rather than accidental. Learned-second-language English uses more uniform sentence shapes, a narrower active vocabulary, and a more formal register, all of which overlap with the statistical signature classifiers were trained to recognise. The detector is not failing; it is correctly measuring something that happens to mean a different thing for ESL authors than for native ones.
TextSight trains on diverse English varieties rather than only US academic prose, which narrows the structural overlap by roughly 40 percent against open-source baselines. The practical effect is a lower false-positive rate, not a zero false-positive rate. No detector eliminates the overlap; the best ones narrow it. If you know the author is writing in a second language, weight the score cautiously and lean on sentence-level evidence in the Discussion rather than the Lit Review, since the vocabulary-cluster and clustered-highlight signals are more language-neutral than the burstiness or hedge-density signals.
If your journal publishes international authors, build the calibration into the workflow rather than into the score. Drop a flagged score by 15 to 20 points for ESL authors before deciding what tier it falls into. Require clustered sentence-level highlights in the Discussion plus a vocabulary cluster plus an iThenticate hit before treating an ESL paper as a high-confidence generative-AI case. For any high-stakes decision, including rejection or sanction, never act on the score alone; lead with the per-sentence evidence and an honest conversation about drafting process.
The detector did not catch the author. The detector flagged sections for closer reading, which the reviewer then evaluated against per-sentence evidence and section-level patterns. Framing the result as a conversation-starter rather than a verdict is what separates reviewer authority from reviewer overreach.
The section-by-section densities, the specific highlighted sentences and the markers they match, the iThenticate cross-verification result, and a request to walk through the drafting process. A genuine author can reconstruct their process in five minutes: where the idea started, which sections were drafted first, what AI assistance was used and how it was disclosed. The reconstruction is usually more diagnostic than the score itself. Reviewers who lead with the evidence rather than the percentage hold up better on appeal and get more honest answers.
A single global percentage with no per-section breakdown. A confident verdict on the strength of one number. A demand for explanation phrased as an accusation. Reviewers who treat the score as the case rather than as the trigger for evidence-building lose authority the moment the author pushes back, and the journal loses the ability to act if the case turns out to be real. The score opens the conversation; it does not close it.
Escalate to the editor when the evidence has converged: clustered red highlights in two or more sections, a vocabulary cluster across Abstract and Discussion, an iThenticate hit on the same passages, and a drafting-process explanation that does not match the per-section pattern. Each one of those signals on its own is information, not evidence. Two of them is a question worth asking the author. Three or more is a case worth escalating.
The methodology sister guide. Six manual signals, three confidence tiers, and the underlying classifier families.
Read the methodology guideThe audience landing page. Classroom and journal use cases, batch review patterns, and ESL calibration for grading.
Open the professors pageRun a real section-by-section scan. Sentence-level highlights, calibrated overall score, bundled Plagiarism Risk.
Open the detectorThe classroom workflow built on this methodology. Per-student review, ESL calibration, integrity conversations.
Open the educator guideCalibrated ML classifier with per-section densities and sentence-level highlights. Free to try with no card. 3 detector scans a day, the full evidence layer on every result, ESL-aware calibration on by default.