Data extraction and OCR with Claude Cowork

TL;DR. Cowork does OCR and structured extraction well enough for mid-market document workloads, with one honest caveat: form fields hit 95% accuracy, free-form line items hit 60–70%. Plan a human review step for the first 90 days. The prompt patterns below are what survive at scale (50+ documents per run, not just one).

What Cowork can extract#

Text from PDFs (native and scanned).
Tables from PDFs — strong on bordered structures, weaker on borderless layouts.
Receipts and invoices: text, totals, line items.
Forms: filled-in fields.
Screenshots and photos of documents (legible camera-phone scans of receipts work; blurry ones do not).

If you can read it on a normal monitor without squinting, Cowork can probably extract it. If you have to zoom in twice to read it yourself, accuracy drops fast.

Realistic accuracy expectations#

Input	Typical accuracy
Form fields (named boxes)	95–97%
Document headers and totals	~95%
Free-form invoice line items	60–70% — always review
Handwriting	Poor — last resort

The gap between extract and extract with confidence is what determines whether a finance pipeline is genuinely automatable. A 95% number on totals is great; a 65% number on line items means the operator still has to scan every output.

The extraction prompt pattern#

Four rules that make extraction outputs usable:

Specify the fields you want, in order, with types (string, number, date).
Ask for CSV or Excel output, not prose. Structured input → structured output.
Ask Cowork to flag low-confidence extractions explicitly. A confidence column saves a manual review.
Include the source filename in each row. When you spot-check, you need to know which file.

Template:

For each PDF in ~/inbox/receipts:
- Extract: vendor (string), date (YYYY-MM-DD), total (number), currency (string),
  category (string from CLAUDE.md list), confidence (low | medium | high).
- One row per receipt.
- Output to ~/output/receipts.csv with column headers in row 1.
- Include source_filename as the first column.
- Flag any row where confidence is low — separate sheet "Review queue".
- Plan first; show me the field mapping before processing.

The plan-first habit matters even more for extraction than for document generation, because errors are silent — a wrong number does not look wrong until someone reconciles it.

Bulk OCR pattern#

For 20 or more files:

Use sub-agents. Cowork handles this if you ask: "split the work into batches of 10 sub-agent runs." This keeps each run focused and the parent run coordinating, not extracting.
Generate a manifest — extraction-manifest.csv — so you can audit which file produced which row.
Keep low-confidence rows in a separate sheet so the human review queue is bounded and visible. A scattered review queue gets ignored.

A 200-file extraction run produces a 200-row CSV plus a manifest plus a review queue. If your review queue is bigger than 30 rows, the prompt needs tightening before scaling further.

When to step up to a real OCR pipeline#

Cowork is not Tesseract, AWS Textract, or Google Document AI. If you are processing more than roughly 1,000 documents a day with strict accuracy SLAs, you want a dedicated OCR pipeline, with Cowork on top for the post-OCR work — summarisation, classification, exception handling, narrative drafting.

The Tinkso pattern in those cases:

Dedicated OCR engine handles raw extraction.
Cowork picks up the structured output and does the judgment work.
An MCP connector bridges them so the operator never sees the seam.

Below 1,000 documents a day, Cowork alone is usually enough.

Data privacy considerations#

Receipts and invoices often contain PII. Don't ask Cowork to "remember" them in Memory — those entries are not the place for personal data.
For regulated functions, a Tinkso-built MCP connector lets you keep the OCR step on-premise and pass only redacted text to Cowork. The model sees what it needs; the raw PII never leaves your environment.
See Audit & compliance for the deeper governance pattern.

Tinkso's take#

The 95% / 65% accuracy gap matters more than buyers expect on the first call. We tell clients to budget a human review step for the first 90 days of any extraction workflow. After 90 days, the prompt and CLAUDE.md are tuned enough that review shrinks to spot-checks — usually one in twenty.

The teams that get this wrong typically over-trust the totals (which are accurate) and under-review the line items (which are not). The fix is not better OCR; it is a better review queue.

Try this#

Take 10 receipts. Run the extraction prompt template above. Hand-check every line item. The accuracy you measure on those 10 is your team's actual baseline — not the marketing number, not the vendor demo, not the average across the internet. Use that number to decide where the human-review threshold sits.