Manual invoice entry has a failure rate that compounds quietly. A bookkeeper keying 200 invoices a month at a 2% typo rate produces four wrong amounts a month — and you usually find out at reconciliation, weeks later. The fix isn’t OCR software with a four-figure license; it’s an hour of n8n setup and an LLM that’s genuinely good at reading messy PDFs.
This tutorial builds a pipeline that watches a Gmail inbox for invoices, extracts the PDF text, runs it through an LLM with a strict JSON schema, writes a clean row to Google Sheets (with a note on swapping in QuickBooks), and flags anything suspicious — duplicate invoice numbers, totals that don’t match line items, unfamiliar vendors — for human review before it touches your books.
Architecture overview
Gmail Trigger (has:attachment filter)
→ IF (is it a PDF?)
→ Extract from File (PDF → text)
→ OpenAI (structured extraction, JSON schema)
→ Code (validate + anomaly checks)
→ IF (anomaly?)
true → Slack #finance-review
false → Google Sheets (append row)
You’ll need: n8n (cloud or self-hosted), a Google account with Gmail and Sheets access, an OpenAI API key, and optionally Slack for the review channel.
Prefer to import and adapt? The full workflow is here — import via Workflow → Import from File, then re-link your credentials:
Free template · n8n
Invoice Email Parser
invoice-parser-n8n.json
Step 1: Gmail trigger with a tight filter
- Add a Gmail Trigger node. Authenticate with OAuth2 — when you create the Google credential, the consent screen must grant the
gmail.readonlyscope at minimum (n8n’s default Gmail scopes cover this, but a Workspace admin may need to allowlist the app). - Set Event to
Message Received, Poll Time to every 5 minutes. - Under Filters, set Search to:
has:attachment filename:pdf (invoice OR rechnung OR factura OR "inv-" OR bill)
- Enable Download Attachments and set the Attachment Prefix to
attachment_.
The search filter is your first line of defense against junk. If vendors email a dedicated address like ap@yourcompany.com, add to:ap@yourcompany.com and the keyword list becomes optional. A practical tip from production use: create a Gmail label invoices-processed now; we’ll apply it at the end of the flow so reprocessing after an error is a one-click affair.
Step 2: Confirm it’s a PDF and extract the text
- Add an IF node: condition String →
{{ $binary.attachment_0.mimeType }}→ equals →application/pdf. Non-PDF attachments (inline logos, .png signatures) fall out here. - Add an Extract from File node. Set Operation to
Extract From PDF, Input Binary Field toattachment_0, and enable Join Pages.
Run a test with a real invoice email. You should see the raw text in the output — addresses, line items, totals, all as one unstructured blob. That blob quality varies wildly by vendor, which is exactly why we use an LLM instead of regex.
Scanned invoices (image-only PDFs) produce empty text here. If a meaningful share of your invoices are scans, branch on {{ $json.text.length < 50 }} and route those through a vision-capable model instead: convert the PDF page to an image and send it to the LLM’s image input. The extraction prompt below works unchanged for vision models.
Step 3: Structured extraction with a JSON schema prompt
- Add an OpenAI node (Chat → Message a Model). Use a current mid-tier model — invoice extraction needs more reliability than lead scoring, so don’t reach for the absolute cheapest tier. As of mid-2026, check current pricing; expect roughly a cent or two per invoice.
- Set temperature to
0and enable JSON/structured output mode.
System message — paste exactly:
You are an invoice data extraction engine. You receive raw text
extracted from a PDF invoice. Return ONLY a JSON object matching
this schema, no markdown, no explanations:
{
"vendor_name": "string",
"vendor_tax_id": "string or null",
"invoice_number": "string",
"invoice_date": "YYYY-MM-DD",
"due_date": "YYYY-MM-DD or null",
"currency": "ISO 4217 code, e.g. USD",
"subtotal": number,
"tax_amount": number,
"total": number,
"line_items": [
{ "description": "string", "quantity": number, "unit_price": number, "amount": number }
],
"payment_terms": "string or null",
"confidence": number between 0 and 1
}
Rules:
- Numbers must be plain JSON numbers: no currency symbols, no
thousands separators. European formats like 1.234,56 become 1234.56.
- If a field is genuinely absent from the text, use null. NEVER guess
an invoice number or total.
- Dates: convert any format to YYYY-MM-DD. If ambiguous (03/04/2026),
use the vendor's country context if visible; otherwise prefer
DD/MM for EU vendors and MM/DD for US vendors and lower confidence.
- Set confidence below 0.7 if the text is garbled, totals are unclear,
or you had to make any judgment call.
User message:
{{ $json.text }}
The confidence field is the load-bearing part of this prompt. Models are decent at self-reporting uncertainty when you give them explicit criteria, and it gives the next step a routing signal that pure extraction schemas lack.
Step 4: Validate and flag anomalies in a Code node
Never trust LLM arithmetic or formatting blindly. Add a Code node:
const raw = $input.first().json.message?.content ?? $input.first().json.content;
let inv;
try {
inv = JSON.parse(String(raw).replace(/```json|```/g, '').trim());
} catch (e) {
return [{ json: { anomaly: true, anomaly_reasons: ['Malformed JSON from LLM'], raw } }];
}
const reasons = [];
// 1. Line items must sum to subtotal (1% tolerance for rounding)
const lineSum = (inv.line_items || []).reduce((s, li) => s + (li.amount || 0), 0);
if (inv.subtotal && Math.abs(lineSum - inv.subtotal) / inv.subtotal > 0.01) {
reasons.push(`Line items (${lineSum.toFixed(2)}) ≠ subtotal (${inv.subtotal})`);
}
// 2. Subtotal + tax must equal total
if (inv.total && Math.abs((inv.subtotal + inv.tax_amount) - inv.total) > 0.05) {
reasons.push(`Subtotal+tax ≠ total (${inv.total})`);
}
// 3. Model wasn't sure
if ((inv.confidence ?? 0) < 0.7) reasons.push(`Low confidence: ${inv.confidence}`);
// 4. Missing critical fields
if (!inv.invoice_number || !inv.total) reasons.push('Missing invoice number or total');
// 5. Suspiciously large amount — tune to your business
if (inv.total > 10000) reasons.push(`Total over review threshold: ${inv.total}`);
return [{ json: { ...inv, line_sum: lineSum, anomaly: reasons.length > 0, anomaly_reasons: reasons,
source_email: $('Gmail Trigger').first().json.from?.value?.[0]?.address ?? '' } }];
For duplicate detection, add a Google Sheets node (operation Get Rows, filter on invoice_number) before this Code node and check whether the lookup returned anything — a hit means you’ve already booked it. The template wires this up.
Step 5: Route and write
- Add an IF node: Boolean →
{{ $json.anomaly }}→ is true. - True branch — Slack node, Send Message to
#finance-review:
⚠️ Invoice needs review: {{ $json.vendor_name }} — {{ $json.invoice_number }}
Total: {{ $json.currency }} {{ $json.total }}
Flags: {{ $json.anomaly_reasons.join('; ') }}
From: {{ $json.source_email }}
- False branch — Google Sheets node, operation Append Row, mapping
invoice_date,vendor_name,invoice_number,currency,subtotal,tax_amount,total,payment_terms, andconfidenceto columns. Use Append mode, not Append or Update — you already dedupe upstream. - On both branches, finish with a Gmail node (operation Add Label) applying your
invoices-processedlabel to{{ $('Gmail Trigger').first().json.id }}.
QuickBooks instead of Sheets: swap the Sheets node for the QuickBooks Online node, resource Bill, operation Create. You’ll need to map vendor_name to a QuickBooks Vendor ref — add a “Get Vendor by display name” lookup first and route to the review channel when no vendor matches, because auto-creating vendors from LLM output is how typo-vendors breed in your chart of accounts. Many teams run Sheets for a month as a staging area, audit the accuracy, then cut over.
Try it yourself
n8n
The Code-node validation layer is the heart of this build — n8n gives you real JavaScript where other platforms make you fight formatters.
Start with n8nThe Make variant
The same pipeline in Make takes about 30 minutes if you know the platform:
- Gmail → Watch Emails (criteria: has attachment, your search query), with Iterator over attachments.
- PDF.co or Make’s built-in text parser — Make has no native PDF text extraction, so you’ll use a third-party module (PDF.co, CloudConvert) or an HTTP call to a self-hosted extractor. This is the main friction point versus n8n.
- OpenAI → Create a Completion with the same system prompt. Enable “Parse JSON response” or follow with a JSON → Parse JSON module using a defined data structure — Make’s typed data structures are actually nicer than raw parsing here.
- Router with two filters:
anomaly = true→ Slack; otherwise → Google Sheets → Add a Row.
Validation math is the weak spot: replicating the Code-node checks means a chain of Set Variable modules with Make’s formula syntax, or a custom JS module on higher tiers. It works, but it’s noticeably more clicking.
When to pick Zapier vs Make vs n8n for invoice processing
- n8n is the strongest fit: native PDF extraction, a real code step for validation, and per-execution costs that don’t care whether you process 50 or 5,000 invoices a month.
- Make is solid if you’re already invested there and don’t mind a paid PDF module; its visual data structures make the JSON mapping pleasant.
- Zapier can do this (Email Parser or Gmail trigger + a PDF extraction app + OpenAI), but multi-step validation logic gets awkward and task costs add up on document-heavy flows. It’s the right pick only for very low volume with maximum simplicity.
See the full platform breakdown in Make vs Zapier vs n8n.
Common errors and fixes
Google OAuth fails with “access blocked” or 403 on Sheets writes. Your Google Cloud OAuth consent screen is missing the Sheets scope, or the app is in testing mode and the refresh token expired after 7 days. Publish the OAuth app (internal is fine for Workspace) and re-authenticate the credential.
Gmail trigger fires on the same email repeatedly. The trigger’s “read” tracking got reset, or you’re filtering with a label the workflow itself removes. Use the processed-label pattern from Step 5 and add -label:invoices-processed to the trigger’s search query.
Extract from File returns empty text. The PDF is a scan. Route low-text PDFs to a vision model as described in Step 2, or reject them to the review channel.
LLM returns numbers as strings (“1234.56”) or with currency symbols. Strings break the Sheets number columns and the validation math. The prompt’s formatting rules prevent most of this; for the rest, coerce in the Code node with parseFloat(String(x).replace(/[^0-9.-]/g, '')).
OpenAI 429s when a vendor sends a batch of 30 invoices at once. Set the workflow to process executions sequentially (queue mode on self-hosted, or a Loop Over Items node with batching) and enable Retry on Fail with a 10-second wait.
Slack webhook/message times out under load. Slack’s chat.postMessage rate limit is roughly 1 message per second per channel. Batch the review notifications: collect anomalies and send a digest instead of one message per invoice if you regularly flag more than a handful at a time.
What to build next
Once the Sheet has a month of clean data, the obvious extensions are a weekly spend-by-vendor digest and three-way matching against purchase orders. And if AP automation sold you on the extract-validate-route pattern, the same skeleton runs our lead capture workflow and support triage bot — different prompts, same discipline about never trusting raw LLM output.
Try it yourself
Make
Want this without self-hosting? Make's visual JSON data structures handle the extraction mapping cleanly.
Start with Make