Drop a finished .pdf into the dashboard and the file-extract endpoint built on officeparser v7 pulls the text server-side, preserves paragraph structure on text-extractable PDFs, and feeds the extracted text into the same ChatGPT classifier that runs on pasted scans. You get sentence-level highlights, an Authenticity Score, and a band classification on the file you actually have, not on a copy-paste that mangled the columns. Eleven file formats accepted from one endpoint. Free to try, no card, no email.
Free tier with no card, no email. Paid tiers billed in USD with yearly billing saving 25%. Full details on the pricing page.
Billed $89.88/year, Save $30
Billed $179.88/year, Save $60
Billed $359.88/year, Save $120
Yearly billing saves 25%. View full pricing →
The path is intentionally short. A writer arrives with a finished PDF, and the workflow respects that by skipping every step that a copy-paste would force.
Open the dashboard at app.textsight.ai and drop the file directly onto the scan area, or click the upload control and pick the file from a folder. The endpoint accepts PDF natively alongside ten other formats. There is no plug-in to install and no browser extension to authorise. A typical essay-length PDF reaches the server in under two seconds on a normal home connection.
The file-extract endpoint built on officeparser v7 pulls the selectable text layer from the PDF, normalises character encoding, and reconstructs paragraph boundaries on text-extractable files. The endpoint is extract-only and does not burn a separate detection quota; the extracted text feeds straight into the standard ChatGPT detection pipeline. The textarea then fills with what the classifier will read, so the writer can sanity-check the extraction before hitting Scan.
The same classifier that runs on pasted text scores the extracted PDF text. The headline is an Authenticity Score on a 0 to 100 scale, with a band classification (Likely Human, Mixed, Likely AI). The classifier reads patterns trained across GPT-3.5, GPT-4, GPT-4o, GPT-5, Claude, Gemini, and Llama, so the result is not tied to one specific ChatGPT version.
The result panel colours each sentence by AI likelihood. A teacher knows which paragraphs to question; a writer knows which lines to rewrite. Sentence-level evidence beats a single document-level verdict because a 30-page PDF that scores 62% AI overall is useless without knowing which 14 sentences caused it. Click Rewrite on any flagged sentence to rewrite it inline, then re-scan to confirm the score moved.
Five common PDF workflows we see for ChatGPT auditing. The detector handles each cleanly when the file is text-extractable; the honest weak spots show up only on scanned image PDFs and dense multi-column journal layouts.
The most common ChatGPT PDF audit. A chapter exported from Word or LaTeX as a PDF goes through the file-extract endpoint with paragraph structure preserved, and the sentence-level highlights flag the exact lines that read as ChatGPT. The ESL calibration matters for students writing in formally-taught academic English. For a multi-chapter dissertation, split by chapter so the highlights stay readable and the Pro tier removes the daily scan ceiling.
Counterparty contracts almost always arrive as PDFs. A clause-by-clause audit is exactly what sentence-level highlights make tractable; the file-extract endpoint handles standard contract layouts cleanly. Long contracts split by section or by named clause for readable scan results, and the band classification gives a fast read on whether the draft leans human or AI-generated.
Journal preprints and accepted papers download as PDFs with rich layouts. The classifier scores the extracted text correctly, with the caveat that papers with floating figures, multi-page tables, and dense footnotes may lose some paragraph structure during extraction. The visual reading order on the result may not match the original page order; a quick visual sanity check against the source PDF is worth running.
RFP responses are increasingly drafted with ChatGPT assistance, and the buyer side wants to know how much. Upload the response PDF, read the sentence highlights, and the audit is done in the time a manual read of section one would have taken. The same workflow runs on the sell side as a pre-submission sanity check.
Once OCR has turned a scanned image into a searchable PDF or plain text, the file-extract endpoint treats it identically to any other text-extractable PDF. The detector does not run OCR itself; if the scan has not been processed yet, run it through any OCR tool first and upload the searchable PDF.
PDF is a layout-preserving container rather than a semantic document format. Being upfront about what extraction can and cannot do matters more on PDF than on plain text, because the same .pdf extension covers wildly different file internals.
Most modern PDFs contain a selectable text layer alongside the visual layout. The file-extract endpoint pulls clean characters from these files; paragraph structure survives, sentence boundaries survive, and the classifier scores the result identically to a paste. Standard contracts, exported Word documents saved as PDF, journal preprints, RFP responses, and report exports almost always fall in this category. The fast way to check: open the PDF in any reader, hit select-all, and watch the text highlight cleanly. If it does, the file is text-extractable.
A PDF run through a flatbed scanner or built from phone photos contains pixels rather than characters. The file-extract endpoint receives an empty string and returns nothing useful. The honest position is that scanned image PDFs need OCR pre-processing before upload, rather than claiming a built-in OCR layer that does not exist on the TextSight side. Most modern PDF readers offer an OCR action under the file menu; Adobe Acrobat, Apple Preview, and ABBYY all handle this job well. Copyleaks at the paid tier is the strongest tool in the wider market for built-in OCR if that is a hard requirement.
Multi-column journal articles with floating figures, multi-page tables, footnotes, and sidebars are the hardest case. The classifier still scores the extracted text correctly, but the visual reading order on the result may not map back to the original page order. For thesis chapters, contracts, RFP bodies, and report exports the structure usually survives; for dense journal articles a sanity check against the original PDF is worth running before acting on the score.
A paste from a PDF reader is not what the classifier should be reading. Three concrete reasons the upload path returns a more honest ChatGPT result on a finished PDF.
Selecting all text in a PDF reader and pasting into a textarea collapses paragraph boundaries, drops indentation, and interleaves two-column layouts. The classifier reads a paragraph-aware document very differently from a flattened wall of text; the score moves and the sentence highlights become harder to map back to the source. The file-extract endpoint preserves paragraph boundaries on text-extractable PDFs, which is what makes the highlights actually useful.
Page numbers, running headers, footnote markers, and cross-references all paste in as if they were body text. A header that repeats across thirty pages adds a thousand characters of noise that the classifier reads as part of the prose. The extract endpoint filters predictable layout artefacts on common PDF templates; the resulting text reads as the writer actually wrote it.
A real PDF audit blocks on the writer cleaning up the paste before scanning. For a single essay that is a minute. For ten PDFs a day it is a chore worth automating out of the workflow entirely. Direct upload makes the audit one drag-and-drop, one click, one read of the sentence highlights, with no formatting cleanup in the middle.
Six-tool ranking across PDF detection, weighted on native upload, structure preservation, and OCR honesty.
See the ranking →Scan many files in one pass with the bulk endpoint and the same sentence-level highlights per result.
Read the guide →The general ChatGPT detector, the paste-first flow, model coverage, and how the classifier scores text.
Read the deep dive →Full tier breakdown for Free, Starter, Pro, and Business. Annual billing saves 25%.
See pricing →Free to try. No card. Native .pdf upload, sentence-level highlights, and the same classifier that powers the paste path.