Twelve copy-paste Claude Cowork prompts for extracting structured data — receipts, invoice headers, line items, forms, tables, contracts. Confidence flagging built in.
TL;DR. Twelve copy-paste prompts for turning unstructured documents (PDFs, images, scans) into clean tabular data, with confidence flagging built in. Confidence flagging is what separates a useful extraction prompt from a dangerous one. The first 90 days are spent calibrating where the review threshold should sit.
Read every image and PDF in /inbox/receipts-[period]/.
Extract per file: date, vendor, amount, currency, payment method.
Output to /output/receipts-[period].xlsx with one row per receipt + a column for source filename.
Add a confidence column (0-100). Rows below 90 confidence go to a "Review" tab.
Apply /inbox/category-rules.xlsx for vendor-to-category mapping.
Read every PDF in /inbox/invoices/.
Extract: vendor, invoice number, date, due date, total, currency, tax amount, PO number (if present).
Output to /output/invoice-headers.xlsx.
Flag missing required fields per row.
Preserve invoice number as text (leading zeros).
Read every PDF in /inbox/invoices/.
Extract line items: description, quantity, unit price, amount.
Output to /output/invoice-lines.xlsx — one row per line, with invoice-header reference column.
Confidence per row; sub-90 rows flagged for review.
Read every PDF in /inbox/forms/.
Extract the fields defined in /inbox/field-spec.md.
Output to /output/forms-extracted.xlsx — one column per field, one row per form.
For unanswered fields, output empty string (not N/A).
Source filename in the first column.
Read /inbox/[doc].pdf.
Identify all tables. For each, extract as a separate sheet in /output/[doc]-tables.xlsx.
Preserve column headers exactly.
Surface any table where structure is ambiguous to /output/extraction-warnings.md.
Read /inbox/id-scans/[name]/.
Extract: full name, document number, date of birth, expiry, issuing authority.
Output to /output/id-extracted.xlsx.
For any field below 95% confidence, leave blank and flag in a notes column.
Treat this as PII; do not save anything to Memory.
Read /inbox/emails-[topic]/.
Extract action items: owner, action, deadline, source-email-filename.
Output to /output/actions.xlsx.
If owner or deadline not stated, mark "[CONFIRM]".
Read every contract in /inbox/contracts/.
For each, extract per /inbox/clause-spec.md (e.g., termination, payment terms, liability cap, jurisdiction).
Output to /output/contract-summary.xlsx — one row per contract, one column per clause.
Include source filename + section reference.
For missing clauses, mark "Not found" — never infer.
Read /inbox/[deck].pptx.
For each slide: number, title, body text, speaker notes.
Output to /output/[deck]-content.xlsx.
Preserve slide order exactly.
Read every CSV in /inbox/dirty/.
Match columns against /CLAUDE.md/canonical-schema.
Map and rename columns to the canonical schema.
For columns Cowork can't map, leave with [UNMAPPED] prefix in /output/normalized/[file].csv.
Produce /output/mapping-report.md.
Read /inbox/handwritten/.
Extract text best-effort. For each file, output:
- best-guess transcription
- alternate readings where ambiguous
- confidence (low/medium/high)
Output to /output/handwriting.md.
Treat as low-trust output.
Read /inbox/duplicates-suspect/.
Identify documents that are likely versions of the same content (≥85% similarity).
Output /output/dedupe-report.csv with cluster, candidate-canonical, others.
Do not delete or move anything; this is a read-only report.
Confidence flagging is what separates a useful extraction prompt from a dangerous one. We never ship an extraction workflow without a confidence column and a review queue. The first 90 days of any extraction are spent calibrating where the threshold should sit — too permissive and bad data flows downstream, too strict and the operator drowns in review. The team that gets this calibration right ships a workflow that lasts; the team that skips it ships a workflow that gets quietly abandoned.
Book a 30-minute call. We'll ask where you are, what your team needs, and which systems Cowork should touch.