An honest hub for the people grading the work. Where AI detection helps in a classroom, where it gets ESL writing wrong, what a defensible review process looks like in 2026, and which tool fits which job. Built for K-12 teachers, university faculty, integrity officers, and academic leaders deciding what to standardise on this semester.
Detection priorities differ across a high-school English department, a research-led seminar, and a registrar's office. Each guide below is written for one role with the constraints, vocabulary, and review process that role actually uses.
Classroom-level workflow for K-12. Catch likely AI work, talk to the student without accusing, and protect ESL writers from over-flagging. Free tier covers most classrooms.
Open teacher guide › Higher ed facultySeminar and lecture-scale workflow. Per-sentence rationale your TA can review, a documented threshold, and an appeal trail that holds up to integrity-office scrutiny.
Open professor guide › Institution leadersProcurement-level view. Pricing across vendors, LMS integrations, audit trail, FERPA scope, ESL bias risk, and how to write an AI policy that does not box your faculty in.
Open university guide › High schoolAge-appropriate detection workflow. How to introduce AI norms to grade 9 to 12, what counts as misuse, and how to avoid a public flag turning into a classroom incident.
Open high-school guide › Tool comparisonHead-to-head ranking of the major detectors on classroom needs. ESL false positives, batch upload, evidence quality, and which free tier actually survives a full semester.
See the ranking › Research and rigourHigher-ed pick, ranked on transparency of methodology, exportable evidence, and resistance to paraphraser-laundered text. Includes published benchmark links per vendor.
See the ranking ›A practical, no-panic read of where detection is genuinely useful and where it overpromises. Written by people who build detectors and have spoken to a lot of teachers.
Two things, mainly. First, screening. A scan across a class of essays surfaces the three or four submissions that read like raw model output, so you can focus your review time where it actually matters. Second, evidence-shaping. A per-sentence breakdown gives both you and the student something concrete to discuss, which is far more useful than the binary "this looks like AI" claim that ends most conversations.
Three things. It cannot prove a student used AI: a probability against a calibration set is not proof. It cannot reliably identify a polished ESL writer as human on its own: Stanford's 2023 study by Liang and colleagues measured several detectors flagging TOEFL essays at rates above 60 percent, and the field has improved but not solved this. And it cannot replace a writing conference, a draft review, or an in-class writing sample as the actual evidence base for an integrity conversation.
As a screening signal that triggers a conversation, not as a verdict. The 2024 to 2026 shift in academic-integrity guidance, including positions published by Turnitin and GPTZero themselves, is to treat detector output as one signal among several. The teachers we hear from who feel best about their process pair the scan with a writing conference, a draft history check, and a transparent AI-use policy that students read on day one. The teachers who feel worst tend to be the ones treating a single score as a verdict.
A process that holds up to an integrity-office review, a parent meeting, and a student appeal. None of these steps need a paid tool; they need a consistent habit and a documented threshold.
State the AI policy in the syllabus on day one. Say what is permitted, what is not, and that detection is part of your review. Students who know the rules dispute them less.
Pick a confidence threshold that triggers a conversation, not a sanction. Most teachers we hear from set it high (above 70 or 80 percent AI score) so screening is targeted.
Run the scan across the class. Flag the small number of submissions above your threshold. Do not treat the score as the finding; treat it as the reason to look closer.
Draft history, version log, prior in-class writing samples, the student's discussion contributions. A flagged scan plus a register mismatch with classroom writing is a real signal.
Talk to the student. Ask them to walk you through the draft. Genuine authors describe their process; ghostwritten or AI work tends to stall on specifics. Be calm; assume good faith.
If you escalate, attach the detector report, the second detector check, the draft history, the writing sample, and a note of the conversation. Your integrity office will need it.
The single biggest fairness problem with AI detection in classrooms is over-flagging non-native English writers. The mechanism is well understood and the mitigation is straightforward.
Most detectors lean on perplexity (how predictable each next word is) and burstiness (how that predictability varies across the document). Second-language academic writing, especially from students taught in a formal "high register" tradition, tends to use safer vocabulary and more uniform sentence structures. That looks like low perplexity and low burstiness, which is exactly what the model associates with machine generation. The signal is real; the conclusion is wrong.
Liang et al. (2023), out of Stanford, found that seven of the leading detectors at the time flagged TOEFL essays from non-native English speakers as AI-generated at rates above 60 percent on at least one detector, with several above 50 percent. Weber-Wulff et al. (2023), published in the International Journal for Educational Integrity, ran a broader audit and found detector false positive rates ranging from 0 to over 50 percent depending on the tool and writing sample. The field has improved since then. The mitigation has not changed.
First, calibrate. If your class includes ESL writers, raise your threshold and treat ESL submissions with extra care. Second, corroborate. Pair any high-score ESL flag with an in-class writing sample at the same complexity level and a conversation about the draft. The combination is far more reliable than the score alone, and far more defensible if a student appeals.
In our June 2026 internal benchmark on 100 ESL passages drawn from Indian, Filipino, and Chinese university student writing, TextSight measured a 6 percent false positive rate, against 22 percent for GPTZero, 17 percent for Turnitin's AI detector, and 19 percent for Originality.ai on the same passages. We treat that as the floor, not the ceiling, and re-run quarterly. The methodology and dataset link are on the accuracy methodology page below.
A short, specific syllabus paragraph beats a long policy document. Three things to state clearly, and one thing to leave out.
"Brainstorming with AI is fine. Using AI for grammar and spelling is fine. Using AI to draft your essay is not fine. Using AI to paraphrase a draft you did not write is not fine." Specific verbs beat vague principles. Students do not read a policy; they read the example.
"If you used AI for any part of the work, list it in a short Process Note at the end of the document: which tool, what for, how you verified the output. Failing to disclose is the violation, not the use itself." A disclosure norm reduces hidden use sharply.
"AI detection is part of our review. A flag triggers a short conversation, not a sanction. We will review your draft history, ask you to walk us through the work, and decide together. Decisions sit with the instructor; appeals go to the academic integrity office." Predictability lowers anxiety and lowers gaming.
Do not name a specific detector in the policy. Tools change; your policy should not have to. Detection vendors update, lose accuracy in specific edge cases, or get replaced; a policy that names "GPTZero" or "Turnitin AI" by brand ties your hands. Reference "an AI detection tool we use" and update internally.
Three real-world educator workflows and the tool that fits each best. Honest about where TextSight is not the right call.
If you grade hundreds of submissions a week, pulled from an LMS, with a queue your TAs share, GPTZero's educator tier and Turnitin's AI detection inside the existing Turnitin workflow are the right call today. Both ship with LMS hooks, a teacher review queue, and a student appeal flow that mirrors what your institution probably already documents. TextSight does not ship LMS integrations in 2026 and is the wrong tool for this job.
If you screen quickly and only look closely at the few essays that read off, TextSight is the strongest fit. The free tier handles three scans per day at 5,000 characters per scan with no signup. Each scan returns sentence-level highlights with per-line evidence (rhythm flat, vocabulary cluster, paragraph cadence), which is the kind of report you can actually share with a student in a writing conference. ESL false positive rates are roughly 40 percent lower than GPTZero on identical-quality essays in our internal testing.
For department heads and integrity officers picking one tool to standardise on, the decision is rarely purely accuracy. It is about LMS integration, audit log, FERPA scope, support response time, ESL calibration, and price per student. The university-focused guide above walks through that procurement table vendor by vendor and is the right starting point.
All cluster pages are kept on the same review cadence and benchmark methodology.
K-12 classroom workflow, fair-conversation script, and an ESL-safe threshold most teachers settle on.
Open teacher guide ›Higher-ed seminar workflow with TA review notes and appeal-ready evidence packaging.
Open professor guide ›Procurement-grade view: LMS hooks, FERPA, audit trail, vendor pricing, and policy template.
Open university guide ›Grade 9 to 12 detection workflow, age-appropriate conversation, and parent-meeting prep.
Open high-school guide ›Detector ranking for K-12 use: ESL FPR, free tier durability, evidence quality, batch upload.
See the ranking ›Higher-ed ranking on methodology transparency, exportable evidence, and paraphrase resistance.
See the ranking ›Free tier is 3 scans a day at 5,000 characters per scan. No card, no signup, no commitment. Sentence-level highlights and per-line evidence on every scan.