HomeAI Detector › Score Your Research Paper for AI

Score your research paper for AI — pre-iThenticate calibration.

Paste a chapter, a section, or a full manuscript, see an Authenticity Score on a 0 to 100 scale, and read which specific sentences carry the AI signal. Calibrated section-by-section so Methods, which is uniform on purpose, is not penalised the way generic detectors penalise it, and Discussion, which is where genuine ChatGPT use shows up, is weighted where it matters. The recommended pre-journal-submission pre-flight: scan, revise the flagged paragraphs in your own voice, re-scan, then run iThenticate and Crossref Similarity Check before you submit. ESL-aware. .edu Pro is $13.99.

Score my paper free See pricing
1,500 words free quota Section-by-section ESL 40% fewer FPs .edu Pro $13.99
The pre-submission workflow

Scan section-by-section, not whole-paper.

A research paper is not one document. It is seven sections with seven different AI-tell profiles. A whole-paper score averages those baselines and tells you almost nothing about where the work is. Section-by-section is the workflow that actually moves a manuscript from flagged to defensible.

Step 1: Scan the Abstract on its own

Paste the abstract first. It is the highest-risk single block in the manuscript because reviewers and screeners read it before anything else, and because ChatGPT abstracts follow a tight four-move template (background, gap, method, contribution) that classifiers weight heavily. Aim for above 75 on the abstract before you move on to the body. A 250-word abstract can be 90 percent honest writing yet still flag because of one polished closing sentence; the scan finds those.

Step 2: Walk the body sections in IMRaD order

Introduction, Literature Review, Methods, Results, Discussion, Conclusion. Paste each section separately rather than the whole paper at once. The sentence-level highlights tell you which paragraphs are pulling the section score down. In a typical ChatGPT-assisted manuscript, two to four paragraphs across the whole paper carry most of the AI signal, and the rest is fine. Targeting those paragraphs is the difference between an honest revision and a structural rewrite.

Step 3: Revise the flagged paragraphs in your own voice

Open your manuscript alongside the highlights and rewrite the red sentences before reaching for any rewrite tool. Read each aloud. Replace one abstract claim per paragraph with a specific number, a named instrument, or a concrete observation from your own data. Vary sentence length so two adjacent sentences are not both in the 18 to 24 word range that classifiers weight. For a stubborn sentence inside a precision-critical span (a procedural clause, a definition), run the AI rewriter in Light mode on just that sentence rather than the whole paragraph.

Step 4: Re-scan, then run iThenticate and Crossref pre-flight

Paste the revised sections back in and verify each section score lifted: above 75 on Introduction and Discussion, above 70 on Results, above 65 on Methods (Methods runs lower by design). Then run iThenticate or your institutional Crossref Similarity Check report as a separate pre-flight before you submit. The two checks cover different risks and rarely overlap; together they cover most of what desk review actually screens for. TextSight does not interact with either pipeline and we report our own score honestly so you can decide whether the manuscript is ready.

The seven sections

Each IMRaD section has a different AI-tell profile.

The scorer was calibrated against a corpus of ChatGPT-assisted manuscripts across STEM, life sciences, and social sciences. The pattern that shows up in an Abstract is not the pattern that shows up in a Discussion. Knowing the profile per section helps you spend revision time where it actually moves the score.

Abstract

ChatGPT abstracts follow a four-move template: background, gap, method, contribution. Each move is one sentence of 22 to 28 words, transitions are explicit. The fix is to compress background and gap into one sentence and lead with the finding, not the field. This is the single highest-yield rewrite in the manuscript because the abstract sets the reviewer's reading frame.

Introduction

The opening sentence is the biggest tell. Sweeping openers ("In the rapidly evolving landscape of," "This paper presents," "This study investigates") appear in about 70 percent of generated introductions. Replace with a concrete finding that surprised you, a counter-intuitive observation, or a specific data point from your own results. Scan the first 200 words as a separate test and treat anything under 70 as a rewrite candidate.

Literature Review

The highest over-flag section because it is citation-heavy and chronological. AI-generated lit reviews summarise one paper per sentence in citation order. Real reviews group three or four studies together by claim. Re-group by argument, keep citation tokens exact, and the section score usually moves 30 to 50 points without losing scholarly density.

Methods

The cleanest section by default in honest writing, the riskiest if AI-generated. Dense technical prose with equations, variable names, and assay codes absorbs the template signal, so a Methods score around 65 is normal. The classifier knows it is reading a Methods section if the structural markers are present and adjusts thresholds accordingly. A Methods score around 85 with Discussion at 45 is the worry pattern; that combination is what real undisclosed AI use looks like.

Results

Statistical reporting language is templated for the same reason Methods is, and reviewers expect it. The flag risk is in the transitional sentences between table walks. The scorer focuses on those and leaves the table-walk language alone. Citation density and reference-heavy passages are stripped during scoring and re-inserted, so a passage with 40 citations in 600 words is scored on the underlying prose, not the citation noise.

Discussion

The section that needs the most attention. Synthesising results, comparing to prior work, hedging limitations, and projecting future directions are all tasks where a tired author asks ChatGPT to write the Discussion. The output reads fluent and confident but lacks specifics. If your Discussion contains "underscoring the multifaceted nature of," "navigating the complexities of," or "this study contributes to a growing body of literature," scrub those first. Aim for above 80 here if you want margin.

Conclusion

"In conclusion, this study has demonstrated" is the single most common AI tell in academic prose. Short enough to rewrite from scratch if the score sits below 70. Drop the synthesis closer ("collectively underscore," "pave the way for"), state the one finding that matters most, and name the specific next experiment instead.

Reading the score bands

What each Authenticity Score range means for a journal submission.

A number on its own does not tell you whether to submit. These five bands describe what the classifier is seeing per section, what publisher screeners tend to do with the same manuscript, and what the right next move is at each band.

80-100: Submission ready

Reads strongly human across all sections. Journal AI screeners are unlikely to flag the manuscript at the submission gate. Reviewers will read the prose without their AI-suspicion antennae triggered. Submit with the disclosure statement your target journal requires if you used AI assistance at any drafting stage.

70-79: Safe for most submissions

Acceptable for conference papers and most journals. For Nature, Science, Cell, JAMA, Lancet, NEJM, and the field-leading venue in your discipline, push Introduction and Discussion to 80 or above with one more editing pass. Methods sections scoring in this range are normal and need no intervention.

55-69: Mixed signal, check by section

Acceptable for Methods and Results given their structural uniformity. Not acceptable for Introduction, Discussion, or Abstract; those need to come up. Treat the discursive sections as the priority. A paper with Methods at 60 and Discussion at 78 is healthy; the reverse is the warning sign.

35-54: Publisher screener will likely flag

High probability that a journal AI screener flags the paper at the submission gate and returns it with a request to clarify AI assistance. If you used AI, disclose explicitly. Either way the discursive sections need rewriting before resubmission. Methods or Results scoring this low is unusual and suggests procedural language was AI-generated, not just prose.

0-34: Do not submit yet

Almost certainly raw or lightly-edited ChatGPT across the discursive sections. Any reasonable screener catches this. Submitting at this score risks desk-reject and, in graduate-thesis contexts, a referral to academic integrity. The fix is a substantive rewrite of Introduction and Discussion, not a quick edit. Use the sentence-level highlights to find every triggered paragraph.

The 2025-2026 policy landscape

What major journals actually screen for.

Between 2024 and 2025, every major publisher updated its author guidelines on generative AI. The policies converge on the same line: AI assistance for outlining, summarising prior work, and language polishing is allowed if disclosed; AI-generated substantive content is not. A pre-submission scan catches sentences that cross that line before a reviewer does.

Nature, Science, Cell, Lancet, JAMA, NEJM

Disclosure required in methods or acknowledgments. LLMs may not be listed as authors. Internal classifier screening before peer review is documented at Nature and operates at several of the others without specific disclosure. A flag triggers an editor query about your AI use and can delay the review timeline by weeks.

IEEE and ACM

AI-use statement required on every submission, naming the model and the sections it touched. ACM extends the policy to revisions, conference papers, and workshop submissions. IEEE flagged roughly 4 percent of its 2024 submissions for AI-content review based on internal screening, per its own published numbers.

ACS, RSC, Elsevier, Wiley, Springer, PLoS

Policies tightened in early 2025. ACS prohibits AI use for creating or altering scientific content and screens with both internal and third-party tools. Elsevier, Wiley, and Springer require disclosure across their journal portfolios. PLoS journals require a specific statement about whether AI tools contributed to text, images, or analysis.

iThenticate and Crossref Similarity Check pre-flight

The AI scan covers one risk; similarity screening covers a different one. Most journals run iThenticate or a Crossref Similarity Check report on submissions, which compares your manuscript against published literature and detects plagiarism or self-plagiarism. Pre-flighting both before submission is the sober move; the two reports rarely overlap and together they cover most of what desk review actually checks.

Policies evolve quickly. Always check the journal's instructions for authors as published at the time of your submission.

For non-native English researchers

ESL-aware scoring, because false positives end careers.

Most international researchers writing in English face structurally higher detector risk because non-native academic register overlaps with the patterns detectors learn from AI output. The scorer on this page is calibrated for that, not against it.

The bias is real and we measured it

In our internal evals against a sample of human-written ESL research-paper sections, the average competitor detector returned a false-positive rate around 18 to 22 percent. The TextSight detector returns roughly 11 to 13 percent on the same sample. That is around 40 percent fewer false positives, not zero, and we report this honestly because the gap matters for international PhD students, postdocs, and researchers whose first language is not English. The score you see is the same score paid users see.

How the workflow shifts for ESL researchers

The same four steps, with the emphasis on Step 3 (revising the flagged paragraphs) rather than Step 4 (re-scanning). Read each flagged sentence aloud; ESL researchers gain more from this exercise than native writers because it surfaces sentences where formal academic register collided with non-native phrasing in a way that reads AI to the classifier. When you reach a rewrite pass, the tool defaults to Light mode and adjusts vocabulary away from idiomatic native-speaker phrasing so your second-language voice stays intact rather than getting flattened toward a register you do not use.

Your voice includes your second-language voice

We do not try to make ESL manuscripts sound like native-speaker manuscripts. That would erase the writer. The goal is the same authentic-voice goal as for any researcher: catch sentences where assistant register leaked in, revise them in your own voice (including your second-language voice), and submit a manuscript that reads like you wrote it. The score is a pre-submission check, not a fluency exam.

Plans & pricing

Same scorer at every tier.

All sentence-level highlights and section calibration available on every plan. Graduate students, postdocs, and faculty with a verified .edu email get Pro at $13.99 per month instead of $19.99. Full details on the pricing page.

Free
$0/forever

 

Score one abstract or section, no card.
  • 1,500 word quota
  • Sentence-level highlights
  • Section-by-section calibration
  • Citations preserved
Start free
Starter
$7.49/month

Billed $89.88/year — Save $30

For a single manuscript or revision cycle.
  • 20,000 AI rewriter words/mo
  • Chrome extension
  • ESL-aware calibration
  • Email support
Get Starter
Business
$29.99/month

Billed $359.88/year — Save $120

For research groups, labs, and supervisors.
  • 150,000 AI rewriter words/mo
  • REST API access
  • 5 team seats
  • White-label PDF reports
Get Business

Yearly billing saves 25%. View full pricing →

Ethical scope

Pre-submission sanity check, not a screener workaround.

Research papers are the use case where the line between legitimate AI-assisted writing and academic dishonesty matters most, because the reputational and disciplinary stakes are highest. We want to be explicit about which side of that line this scorer sits on.

What the scorer is built for

Manuscripts you authored, where ChatGPT was used as an outline assistant, a literature summariser, or a language polisher inside your journal's disclosure policy. The research is yours, the analysis is yours, the argument is yours. The scorer catches sentences where the assistant register leaked into the prose so the submitted manuscript reads in your voice rather than the ChatGPT voice. This is closer to a careful proofread than to anything else.

Not a Nature, Elsevier, or iThenticate workaround

We make no promise that TextSight will get any specific manuscript past Nature's classifier, Elsevier's screener, iThenticate, or any other journal pipeline. We report our own score honestly and explain what it means. If a section is mostly ChatGPT and only lightly edited by you, the scan will tell you that and no AI rewriter pass will magically fix it; it cannot put authentic analysis that was not there. The score and the highlights are diagnostic.

Disclosure is non-negotiable

Even after a clean scan, if you used ChatGPT for outlining, lit-review summarising, or language polishing, disclose it in the methods or acknowledgments as your target journal's policy requires. Detection of undisclosed use is a far bigger problem than disclosed-and-cleaned-up use. The scorer is not a substitute for the disclosure statement; it is the polish step you run before the disclosure statement.

For PIs and supervisors reading this page

If you are advising on whether TextSight is appropriate for your group, the framing is: same scope as a grammar checker or a journal language-editing service. Legitimate as a self-check on disclosed-use language polish, not legitimate as a way to disguise generated substantive content. The scorer and AI rewriter are available for lab-wide use at the Business rate, with 5 seats and a 90-day audit trail.

FAQ

Score your research paper for AI, frequently asked.

What does the research-paper AI score actually measure?
TextSight returns an Authenticity Score on a 0 to 100 scale: 100 reads fully human to the classifier, 0 reads fully AI. The number aggregates probabilities across five signals: paragraph templating, sentence-length variance, vocabulary fingerprint, punctuation signature, and hedge density. For research papers the score is most meaningful section-by-section because each IMRaD section has a different baseline. The sentence-level colour map is what you act on; the number is the headline.
Why does my Methods section flag as AI when I wrote it myself?
Methods sections are deliberately uniform. The IMRaD convention rewards passive voice, fixed clause patterns, and standardised procedural language because reviewers need to reproduce the work. Generic detectors read this uniformity as low burstiness and flag the section even when no AI was used. TextSight calibrates Methods differently and weights AI-tell signals from Introduction and Discussion more heavily, where genuine ChatGPT use is far more common. A Methods score around 65 with Discussion around 80 is the healthy pattern, not the reverse.
How does this fit with iThenticate and Crossref Similarity Check?
AI scoring and similarity checking cover different risks and almost never overlap. iThenticate and Crossref Similarity Check compare your manuscript against published literature to detect plagiarism and self-plagiarism. TextSight checks whether your prose reads AI-generated to a classifier of the kind major publishers now run before peer review. Pre-flighting both is the sober move for a journal submission, and TextSight does not interact with either pipeline.
Do journals actually run AI screening before peer review?
Many do. Nature, Science, Cell, Lancet, JAMA, NEJM, IEEE, ACM, ACS, RSC, Elsevier, Wiley, Springer, and PLoS have all published AI-use policies since 2024. Several run internal classifiers on submissions before the desk-review stage. A flag does not always mean rejection but usually triggers an editor query about AI use and can delay the review timeline by weeks. Disclosure is almost always required regardless of score.
I am an ESL researcher. Will the scorer flag me unfairly?
ESL researchers face false-positive rates roughly three to five times higher on most detectors, because the more formal, less idiomatic register typical of non-native academic writing overlaps with patterns detectors learn from the model side. TextSight is calibrated against an ESL academic sample and returns roughly 40 percent fewer false positives on that register than the average competitor in our internal evals. The score is gentler for you, not stricter.
Will scanning my dissertation trigger plagiarism systems?
No. TextSight does not index your text into any public corpus and does not store scan content beyond the session unless you save the result to your dashboard. Pasted text is processed in-memory and discarded. Your draft will not appear in Turnitin, iThenticate, or any other similarity database as a result of scanning.
Do graduate students and postdocs get a discount?
Yes. Researchers with a verified .edu email get Pro at 13.99 USD per month instead of the standard 19.99, with the full 50,000 AI rewriter words per month and access to all three modes. The discount applies the same way to faculty addresses and is applied at signup once the email is verified.
How long does the section-by-section workflow take on a typical paper?
Around 30 to 60 minutes end to end on a 6,000-word manuscript. Scan each of the seven sections separately (8 to 12 minutes total), identify red sentences across sections (around 10 minutes), revise the flagged paragraphs in your own voice (15 to 35 minutes), then re-scan to verify the score lifted (5 to 8 minutes). Longer manuscripts in the 10,000 to 15,000 word range scale linearly and are best handled across two sittings.
Related

More for the researcher workflow.

Pre-screen your manuscript. Submit in your own voice.

Free to try, no card. Section-by-section calibration, sentence-level evidence, ESL-aware, citations preserved. .edu Pro at $13.99.

Score my paper free See pricing
Pre-iThenticate calibration · Citations preserved · Built for authentic voice, not a screener workaround