HomeAI Detector › PDF AI Detector

PDF AI Detector — native .pdf upload, sentence highlights.

Drop a finished .pdf into the dashboard and the file-extract endpoint built on officeparser v7 pulls the text server-side, preserves paragraph structure on text-extractable PDFs, and feeds the extracted text into the same AI classifier that runs on pasted scans. You get sentence-level highlights, an Authenticity Score, and a band classification on the file you actually have, not on a copy-paste that mangled the columns. Eleven file formats accepted from one endpoint. Free tier includes native .pdf upload, no card, no email.

Upload a PDF free See the 4-step workflow
Native .pdf upload 11-format file extract Sentence-level highlights
Pricing

Pricing for a PDF-heavy workflow.

Free tier with no card, no email, native .pdf included. Paid tiers billed in USD with yearly billing saving 25%. Full details on the pricing page.

Free
$0/forever

 

Try the PDF detector. No card, no email.
  • 3 scans / day
  • 5,000 chars per scan
  • Native .pdf upload
  • Sentence-level highlights
Start free
Starter
$7.49/month

Billed $89.88/year, Save $30

For students and light writers checking PDFs a few times per day.
  • 20 scans / day
  • 20,000 AI rewriter words/mo
  • 11-format file extract
  • Email support
Get Starter
Business
$29.99/month

Billed $359.88/year, Save $120

For agencies and small content teams running shared PDF workflows.
  • 100,000 AI rewriter words/mo
  • REST API access
  • 5 team seats
  • White-label PDFs
Get Business

Yearly billing saves 25%. View full pricing →

How it works

Four steps from .pdf to sentence highlights.

The path is intentionally short. A writer arrives with a finished PDF, and the workflow respects that by skipping every step a copy-paste would force.

1. Upload the .pdf

Open the dashboard at app.textsight.ai and drop the file directly onto the scan area, or click the upload control and pick the file from a folder. The endpoint accepts PDF natively alongside ten other formats. There is no plug-in to install and no browser extension to authorise. A typical essay-length PDF reaches the server in under two seconds on a normal home connection.

2. File extract runs server-side

The file-extract endpoint built on officeparser v7 pulls the selectable text layer from the PDF, normalises character encoding, and reconstructs paragraph boundaries on text-extractable files. The endpoint is extract-only and does not burn a separate detection quota; the extracted text feeds straight into the standard AI detection pipeline. The textarea then fills with what the classifier will read, so the writer can sanity-check the extraction before hitting Scan.

3. AI classifier returns a 0 to 100 score

The same classifier that runs on pasted text scores the extracted PDF text. The headline is an Authenticity Score on a 0 to 100 scale, with a band classification (Likely Human, Mixed, Likely AI). The classifier reads patterns trained across GPT-3.5, GPT-4, GPT-4o, GPT-5, Claude, Gemini, Llama, and Mistral, so the result is not tied to one specific model family.

4. Review sentence-level highlights

The result panel colours each sentence by AI likelihood. A teacher knows which paragraphs to question; a writer knows which lines to rewrite. Sentence-level evidence beats a single document-level verdict because a 30-page PDF that scores 62% AI overall is useless without knowing which 14 sentences caused it. Click Rewrite on any flagged sentence to rewrite it inline, then re-scan to confirm the score moved.

PDF use cases

What kind of PDF are you scanning.

Five common PDF workflows we see for AI auditing. The detector handles each cleanly when the file is text-extractable; the honest weak spots show up only on scanned image PDFs and dense multi-column journal layouts.

Thesis and dissertation chapters in PDF

The most common PDF AI audit. A chapter exported from Word or LaTeX as a PDF goes through the file-extract endpoint with paragraph structure preserved, and the sentence-level highlights flag the exact lines that read as AI-generated. The ESL calibration matters for students writing in formally-taught academic English. For a multi-chapter dissertation, split by chapter so the highlights stay readable and the Pro tier removes the daily scan ceiling.

Contracts and legal drafts that arrive as PDFs

Counterparty contracts almost always arrive as PDFs. A clause-by-clause audit is exactly what sentence-level highlights make tractable; the file-extract endpoint handles standard contract layouts cleanly. Long contracts split by section or by named clause for readable scan results, and the band classification gives a fast read on whether the draft leans human or AI-generated before the redline pass begins.

Research papers downloaded from journal sites

Journal preprints and accepted papers download as PDFs with rich layouts. The classifier scores the extracted text correctly, with the caveat that papers with floating figures, multi-page tables, and dense footnotes may lose some paragraph structure during extraction. The visual reading order on the result may not match the original page order; a quick visual sanity check against the source PDF is worth running.

RFP responses and proposal PDFs

RFP responses are increasingly drafted with AI assistance, and the buyer side wants to know how much. Upload the response PDF, read the sentence highlights, and the audit is done in the time a manual read of section one would have taken. The same workflow runs on the sell side as a pre-submission sanity check on a vendor pitch deck or proposal narrative.

Scanned essays that have been OCR processed already

Once OCR has turned a scanned image into a searchable PDF or plain text, the file-extract endpoint treats it identically to any other text-extractable PDF. The detector does not run OCR itself; if the scan has not been processed yet, run it through any OCR tool first and upload the searchable PDF.

PDF format honesty

What can and cannot be extracted.

PDF is a layout-preserving container rather than a semantic document format. Being upfront about what extraction can and cannot do matters more on PDF than on plain text, because the same .pdf extension covers wildly different file internals.

Text-extractable PDFs work cleanly

Most modern PDFs contain a selectable text layer alongside the visual layout. The file-extract endpoint pulls clean characters from these files; paragraph structure survives, sentence boundaries survive, and the classifier scores the result identically to a paste. Standard contracts, exported Word documents saved as PDF, journal preprints, RFP responses, and report exports almost always fall in this category. The fast way to check: open the PDF in any reader, hit select-all, and watch the text highlight cleanly. If it does, the file is text-extractable.

Scanned image PDFs need OCR first

A PDF run through a flatbed scanner or built from phone photos contains pixels rather than characters. The file-extract endpoint receives an empty string and returns nothing useful. The honest position is that scanned image PDFs need OCR pre-processing before upload, rather than claiming a built-in OCR layer that does not exist on the TextSight side. Most modern PDF readers offer an OCR action under the file menu; Adobe Acrobat, Apple Preview, and ABBYY all handle this job well.

Complex layouts may lose paragraph structure

Multi-column journal articles with floating figures, multi-page tables, footnotes, and sidebars are the hardest case. The classifier still scores the extracted text correctly, but the visual reading order on the result may not map back to the original page order. For thesis chapters, contracts, RFP bodies, and report exports the structure usually survives; for dense journal articles a sanity check against the original PDF is worth running before acting on the score.

Eleven formats, one endpoint

The file-extract endpoint does more than PDF.

The same officeparser v7 endpoint that handles native .pdf upload also handles ten other document formats, so the AI detection workflow stays the same whether the writer arrives with a PDF, a Word doc, an ePub, or a spreadsheet.

Document formats: PDF, DOCX, DOC, ODT, RTF, EPUB, TXT, HTML

These are the structure-aware formats. Paragraph boundaries, sentence boundaries, and basic body-text formatting survive extraction. The classifier sees what the writer wrote, not a flattened concatenation, which is what makes the sentence-level highlights map back to the source cleanly.

Tabular and slide formats: XLSX, PPTX, CSV

Spreadsheets and presentations are concatenated cell-by-cell or slide-by-slide and scored as a single block of text. The structure is shallower, the highlights still render, and the result is useful for vetting a deck or a written-narrative column inside a spreadsheet. The endpoint is extract-only and does not burn a separate detection quota across any format.

The copy-paste problem

Why direct .pdf upload beats paste.

A paste from a PDF reader is not what the classifier should be reading. Three concrete reasons the upload path returns a more honest AI result on a finished PDF.

Paste destroys paragraph structure

Selecting all text in a PDF reader and pasting into a textarea collapses paragraph boundaries, drops indentation, and interleaves two-column layouts. The classifier reads a paragraph-aware document very differently from a flattened wall of text; the score moves and the sentence highlights become harder to map back to the source. The file-extract endpoint preserves paragraph boundaries on text-extractable PDFs, which is what makes the highlights actually useful.

Paste leaks artefacts that change the score

Page numbers, running headers, footnote markers, and cross-references all paste in as if they were body text. A header that repeats across thirty pages adds a thousand characters of noise that the classifier reads as part of the prose. The extract endpoint filters predictable layout artefacts on common PDF templates; the resulting text reads as the writer actually wrote it.

Paste forces the writer to fix formatting first

A real PDF audit blocks on the writer cleaning up the paste before scanning. For a single essay that is a minute. For ten PDFs a day it is a chore worth automating out of the workflow entirely. Direct upload makes the audit one drag-and-drop, one click, one read of the sentence highlights, with no formatting cleanup in the middle.

FAQ

PDF AI detector frequently asked.

Can I upload a .pdf directly to the AI detector without copy-paste?
Yes. TextSight accepts native .pdf upload through the file-extract endpoint built on officeparser v7. Drop the PDF into the dashboard, the text is extracted server-side, paragraph structure is preserved on text-extractable PDFs, and the same sentence-level highlights and Authenticity Score that pasting returns appear on the result. No selecting all, no pasting into a textarea, no broken line breaks across page footers.
Is .pdf upload available on the free tier or do I need to pay?
Native .pdf upload is on the free tier at the same quality as Pro. The file goes through the same officeparser v7 file-extract endpoint, the text layer is pulled server-side, and the sentence-level highlights render identically. The free tier caps daily volume at 3 scans per day and 5,000 characters per scan, which fits a typical 4 to 5 page text-layer essay. Heavier workloads belong on Pro at $19.99 per month, or $14.99 per month billed yearly.
Does the file-extract endpoint accept formats besides PDF?
Yes. The endpoint accepts eleven formats through officeparser v7: PDF, DOCX, DOC, ODT, RTF, EPUB, TXT, HTML, XLSX, PPTX, and CSV. The endpoint is extract-only and feeds the extracted text into the standard AI detection pipeline; it does not burn a separate detection quota. PDF and DOCX are structure-aware; spreadsheets and presentations are concatenated and scored normally.
Does TextSight do OCR on scanned image PDFs?
TextSight extracts text directly from PDFs that contain a selectable text layer, which is what almost every modern exported PDF uses. PDFs that are scanned images of paper documents carry no embedded text and need OCR pre-processing before upload. The honest framing is that TextSight handles text-extractable PDFs cleanly and quickly; if the file is a flatbed scan or a phone photo, run it through any OCR tool first and upload the resulting searchable PDF.
Will complex PDF layouts like tables and figures keep their structure?
Paragraph structure is preserved on text-extractable PDFs that use a standard single-column or two-column body layout. Complex layouts with floating figures, multi-page tables, footnotes, or sidebars may lose some paragraph structure during extraction because PDF is a layout format rather than a semantic document format. The classifier still scores the extracted text correctly; the visual reading order on the result may not match the original page order. For thesis chapters, contracts, and RFP bodies the structure usually survives; for journal articles with figures and tables a quick visual sanity check is worth running.
Which PDF use cases does the AI detector handle best?
Thesis and dissertation chapters exported to PDF, counterparty contracts and legal drafts that arrive as PDFs, research papers downloaded from journal sites, RFP and proposal responses, and scanned essays that have already been OCR processed. The common thread is that the writer has a finished PDF on disk and needs an AI detection result without round-tripping through a text editor. The dashboard supports drag-and-drop directly onto the scan area for these workflows.
Is the PDF scan result identical to pasting the text?
Yes. The same classifier runs whether the input arrives by paste or by upload because the file-extract endpoint pulls text first and feeds it into the standard AI detection pipeline. The result returns identical sentence highlights, identical Authenticity Score, identical band classification. The upload path is faster for a writer who already has a finished PDF; the paste path is faster for a writer drafting inside the app or scanning a short snippet.
How long can a single PDF scan be?
The per-scan character cap is set by the active plan. Free is 5,000 characters per scan, roughly a 4 to 5 page text-layer essay. Pro raises this to 10,000 characters per scan, Business to 50,000, and Enterprise to 150,000. Page count itself is not the gate; a short PDF with dense text can hit the ceiling before a long PDF with figures and white space. For very long documents above the ceiling, split the file at chapter or section boundaries and scan each part as its own job.
Related

More PDF and AI detector guides.

Score a PDF for AI. No copy-paste.

Free to try. No card. Native .pdf upload, sentence-level highlights, and the same classifier that powers the paste path.

Upload a PDF free See pricing
Native .pdf upload · 11-format file extract · Sentence-level highlights