AI Invoice Processing with TypeScript: Reactive Agent Tutorial
TL;DR: Stop writing spaghetti code to process invoices. Learn how to build an intelligent document processing pipeline that automatically extracts, validates, categorizes, and flags invoices - all without writing orchestration code. Using the @agentiny framework, we'll process invoices in a reactive, maintainable way. Works for invoices, receipts, contracts, resumes, and any document that makes you want to pull your hair out.
The Problem with Traditional Document Processing (Or: Why Your Invoice Code Probably Sucks)
Let me guess - you've written code like this before:
async function processInvoice(document: string) {
try {
const extracted = await callClaude("extract data", document);
const validated = await callClaude("validate data", extracted);
const categorized = await callClaude("categorize", validated);
const anomalies = await callClaude("detect issues", categorized);
const report = await callClaude("generate report", anomalies);
return report;
} catch (error) {
// Handle errors for each step?
}
}
If you've ever built an AI document processing system, automated invoice extraction, or tried to add intelligence to document workflows, you know this code. We've all written it. And we've all regretted it six months later when the business wants to add "just one more validation step."
This works, but it has problems:
- Brittle as hell: Change one step, rewrite the whole function. Want to add duplicate detection? Good luck finding where to slot it in.
- Hard to test: Each step is coupled to the next. Testing extraction means running the entire pipeline.
- Error-prone: Complex error handling at each step. Did the validation fail or did the API timeout? Who knows!
- Not reusable: Orchestration logic is welded to business logic. Want to use the same validation in another pipeline? Copy-paste time.
- Debugging nightmare: "It broke somewhere in the middle" is not a useful error message.
The worst part? This is supposed to be the simple version. Production document intelligence systems get way gnarlier. Add parallel processing, retries, webhooks, database persistence, audit logs, and approval workflows, and suddenly you're maintaining a 500-line async function that nobody wants to touch.
A Better Way: Reactive Agents for Document Automation
What if instead of telling the computer how to process documents step-by-step, you could just define when each step should happen?
// When we have a document, extract data
agent.once(hasDocument, [extractData]);
// When we have extracted data, validate it
agent.once(hasExtractedData, [validateData]);
// When validated, categorize
agent.once(hasValidation, [categorize]);
// And so on...
Then to process a document:
agent.setState({ documentContent: invoice });
// Everything else happens automatically!
This is the reactive agent pattern, and it's what we'll build today. No more orchestration spaghetti. No more "step 7 of 12" comments. Just clean, declarative rules that fire when conditions are met.
Why I Built @agentiny (And Why You Might Need It Too)
I'm Thomas, and I run a web development business in Sydney.
I looked at existing agent frameworks. They were either:
- Too heavy - Installing half of npm to process a PDF
- Over-engineered - "Just read these 47 documentation pages..."
- Vendor lock-in - Married to a specific AI provider or cloud platform
I needed something tiny, simple, and actually usable for real client projects. Something I could drop into a TypeScript codebase and have working in 10 minutes.
So I built @agentiny.
It has:
- Zero dependencies
- Type-safe - Full TypeScript support. Your IDE actually helps you.
- Simple API - Triggers, conditions, actions. That's it.
- Async-first - Built for modern async workflows and AI API calls.
- Framework agnostic - Works with React, Vue, Node.js, Deno, Bun, whatever.
Perfect for document intelligence, workflow automation, event-driven systems, and anywhere you need AI to do things when conditions are met.
I'm actively improving it and adding features based on real project needs. If you have ideas or hit edge cases, the GitHub issues are open - I actually respond to them.
Building the Invoice Processor: From Zero to Production
Let's build a production-ready invoice processor that actually solves real problems:
- Extracts structured data from invoices (no more manual data entry)
- Validates the math (catches vendor mistakes)
- Categorizes expenses (automatic accounting)
- Detects anomalies (flags suspicious invoices)
- Generates executive summaries (for people who don't read invoices)
Step 1: Define Your State
interface InvoiceState {
documentContent?: string;
extractedData?: {
vendor: string;
items: Array<{ description: string; total: number }>;
total: number;
};
validationResults?: {
mathCorrect: boolean;
issues: string[];
};
category?: string;
anomalies?: string[];
report?: string;
}
This is your pipeline's data structure. Each stage adds more information. Think of it as a document moving through an assembly line, getting processed at each station.
Step 2: Create AI-Powered Actions
const extractDataAction = createAnthropicAction<InvoiceState>(
{ apiKey: process.env.ANTHROPIC_API_KEY! },
{
prompt: (state) => </span></span> <span class="line"><span style="color:#032F62"> Extract structured data from this invoice:</span></span> <span class="line"><span style="color:#032F62"> ${</span><span style="color:#24292E">state</span><span style="color:#032F62">.</span><span style="color:#24292E">documentContent</span><span style="color:#032F62">}</span></span> <span class="line"><span style="color:#032F62"> </span></span> <span class="line"><span style="color:#032F62"> Return JSON with: vendor, items[], subtotal, tax, total</span></span> <span class="line"><span style="color:#032F62"> ,
onResponse: (response, state) => {
state.extractedData = JSON.parse(response);
},
}
);
Each action is an async function that reads state, calls an AI model, and updates state. Pure, testable, reusable.
Step 3: Define Triggers (Where the Magic Happens)
// Extract when document loads
agent.once(
(state) => !!state.documentContent && !state.extractedData,
[extractDataAction]
);
// Validate after extraction
agent.once(
(state) => !!state.extractedData && !state.validationResults,
[validateDataAction]
);
// Continue the chain...
Triggers fire automatically when their conditions become true. No manual orchestration needed. Add a new step? Just add another trigger. Remove a step? Delete the trigger. The rest of the pipeline doesn't care.
Step 4: Start Processing
await agent.start();
// Feed in a document - the pipeline executes automatically
agent.setState({
documentContent: invoiceText,
});
// Wait for completion
await agent.settle();
const result = agent.getState();
console.log(result.report); // Executive summary ready!
That's it. Drop in a document, get out structured, validated, categorized data with anomaly detection and a summary report.
The Complete Pipeline: Invoice Processing on Autopilot
Here's what happens automatically once you set documentContent:
Document loaded
â†"
Extract structured data (vendor, items, totals)
â†"
Validate math (check calculations)
â†"
Categorize expense (Office, Travel, etc.)
â†"
Detect anomalies (high amounts, errors)
â†"
Generate report (executive summary)
Each arrow is a trigger that fires when the previous stage completes. The computer figures out the order. You just define the rules.
Sample Output: What You Actually Get
📊 INVOICE PROCESSING RESULTS
Vendor: ACME Office Supplies Inc.
Total: $222.88
Category: Office Supplies (98% confidence)
✅ Validation: All calculations correct
⚠️ Anomalies: None detected
📝 Summary:
ACME Office Supplies provided printer supplies and office
materials totaling $222.88. All calculations verified.
Standard office expense approved for processing.
Clean, structured, ready for your accounting system or database. No human had to read this invoice. No human had to type in the vendor name. No human had to double-check the math. The AI did it all.
Why This Approach Wins (And Traditional Approaches Lose)
1. Separation of Concerns
Each stage is independent. Change validation logic without touching extraction. Add a new categorization model without rewriting the anomaly detector. Your code becomes modular instead of a monolith.
2. Easy Testing
test("extraction works", () => {
const state = { documentContent: "test invoice" };
extractDataAction(state);
expect(state.extractedData).toBeDefined();
});
Test each action in isolation. No mocking, no setup, no teardown. Just pure functions doing pure function things.
3. Incremental Development
Add stages one at a time. They integrate automatically.
// Later, add duplicate detection
agent.once(hasExtraction, [checkDuplicates]);
Ship the MVP with basic extraction and validation. Add categorization next sprint. Add anomaly detection when you have time. Each addition is a few lines of code, not a rewrite.
4. Clear Error Handling
const agent = new Agent({
initialState: {},
onError: (error) => {
logToService(error);
notifyTeam(error);
},
});
Errors get caught at the framework level. No try-catch blocks scattered everywhere. One place to handle failures.
5. Observable and Debuggable
Built-in logging shows exactly what's happening:
✓ Data extracted successfully
✓ Validation completed
✓ Categorized as: Office Supplies (98% confidence)
✓ No anomalies detected
✓ Report generated
When something breaks (and it will), you know exactly which stage failed. Debugging becomes "fix the broken stage" instead of "trace through 200 lines of async functions."
Taking It to Production: Servers, APIs, and Real-World Deployment
Building a demo is easy. Running it in production where real businesses depend on it? That's different.
Here's how to connect this reactive invoice processor to actual systems:
Option 1: REST API Endpoint
Wrap your agent in an Express server and expose it as an API for document processing:
import express from "express";
const app = express();
app.use(express.json());
app.post("/api/process-invoice", async (req, res) => {
const agent = createInvoiceAgent();
await agent.start();
agent.setState({ documentContent: req.body.document });
// Wait for processing
await agent.settle();
const result = agent.getState();
res.json(result);
});
app.listen(3000, () => {
console.log("Invoice processor API running on port 3000");
});
Now you have a document intelligence API. POST an invoice, GET structured data. Simple.
Option 2: File Upload Watcher
Monitor a folder for new invoices and process them automatically:
import chokidar from "chokidar";
import fs from "fs/promises";
// Watch for new files
chokidar.watch("/uploads/invoices/*.pdf").on("add", async (filePath) => {
console.log(New invoice detected: ${</span><span style="color:#24292E">filePath</span><span style="color:#032F62">});
// Extract text from PDF
const documentContent = await extractTextFromPDF(filePath);
// Process it
const agent = createInvoiceAgent();
await agent.start();
agent.setState({ documentContent });
await agent.settle();
// Save results
const result = agent.getState();
await fs.writeFile(
filePath.replace(".pdf", "_processed.json"),
JSON.stringify(result, null, 2)
);
});
Drop invoices into a folder, get JSON files out. Perfect for batch processing or integrating with document management systems.
Option 3: Cloud Storage Trigger (AWS S3, Google Cloud Storage)
React to file uploads in cloud storage buckets:
// AWS Lambda example
export const handler = async (event: S3Event) => {
const bucket = event.Records[0].s3.bucket.name;
const key = event.Records[0].s3.object.key;
// Download file
const file = await s3.getObject({ Bucket: bucket, Key: key }).promise();
const documentContent = file.Body.toString("utf-8");
// Process
const agent = createInvoiceAgent();
await agent.start();
agent.setState({ documentContent });
await agent.settle();
// Save to database
const result = agent.getState();
await saveToDatabase(result);
};
Upload an invoice to S3, trigger a Lambda function, process it automatically. Scale to thousands of invoices without thinking about servers.
Option 4: Webhook Integration
Connect to third-party services that send you documents via webhook:
app.post("/webhook/receipt-bank", async (req, res) => {
const { documentUrl, metadata } = req.body;
// Fetch document
const response = await fetch(documentUrl);
const documentContent = await response.text();
// Process
const agent = createInvoiceAgent();
await agent.start();
agent.setState({ documentContent, metadata });
// Acknowledge immediately
res.status(202).json({ status: "processing" });
// Process async
await agent.settle();
const result = agent.getState();
// Callback to their system
await fetch(metadata.callbackUrl, {
method: "POST",
body: JSON.stringify(result),
});
});
Integrate with accounting software, document scanning apps, email parsing services - anything that can POST JSON to an endpoint.
Option 5: Background Job Queue
For high-volume processing, use a job queue:
import Bull from "bull";
const invoiceQueue = new Bull("invoice-processing");
// Producer: Add jobs
app.post("/api/queue-invoice", async (req, res) => {
await invoiceQueue.add({ document: req.body.document });
res.json({ status: "queued" });
});
// Consumer: Process jobs
invoiceQueue.process(async (job) => {
const agent = createInvoiceAgent();
await agent.start();
agent.setState({ documentContent: job.data.document });
await agent.settle();
return agent.getState();
});
Rate limiting, retry logic, concurrency control - all handled by the queue. Your agent just processes documents.
Real-World Architecture Example
Here's how I deploy this for clients:
Client's System
↓
REST API (Express + TypeScript)
↓
Job Queue (Bull/BullMQ)
↓
Worker Process (Agentiny Agent)
↓
Database (PostgreSQL)
↓
Webhook back to client
Clients POST documents to the API. API queues them. Workers process them using agentiny agents. Results go to the database and webhook back to the client. Clean separation of concerns. Scales horizontally. Easy to monitor.
The reactive agent pattern works at any scale - from a simple Node script to a distributed processing cluster.
Real-World Extensions: Beyond Basic Invoice Processing
This pattern scales to complex workflows. Here are extensions I've built for actual client projects:
Add Approval Routing
agent.when((state) => state.status === "flagged", [sendToManagerForApproval]);
High-value invoices or anomalies go to a human for review. Everyone else gets processed automatically.
Add Duplicate Detection
agent.once((state) => !!state.extractedData, [checkAgainstDatabase]);
Check if you've already paid this invoice. Prevents duplicate payments (saves money, makes you look competent).
Add Multi-Document Processing
const batch = [invoice1, invoice2, invoice3];
batch.forEach((doc) => {
const agent = createInvoiceAgent();
agent.setState({ documentContent: doc });
});
Process hundreds of invoices in parallel. Each gets its own agent instance. No shared state, no race conditions.
Add Database Persistence
agent.when(
(state) => !!state.report,
[
async (state) => {
await db.invoices.create({
vendor: state.extractedData.vendor,
total: state.extractedData.total,
category: state.category,
report: state.report,
});
},
]
);
Automatically save processed invoices to your database. No extra code in your main pipeline.
Add Email Notifications
agent.when(
(state) => state.anomalies && state.anomalies.length > 0,
[
async (state) => {
await sendEmail({
to: "[email protected]",
subject: "Invoice Anomaly Detected",
body: Issues found: ${</span><span style="color:#24292E">state</span><span style="color:#032F62">.</span><span style="color:#24292E">anomalies</span><span style="color:#032F62">.</span><span style="color:#6F42C1">join</span><span style="color:#032F62">(</span><span style="color:#032F62">", "</span><span style="color:#032F62">)</span><span style="color:#032F62">},
});
},
]
);
Alert your team when something fishy shows up. Reactive notifications without cluttering your business logic.
Performance Considerations for Production AI Pipelines
Real talk about running AI document processing at scale:
Model Selection Matters
- Claude Haiku 4: Use for simple extraction tasks. Cheap, fast, good enough for 80% of documents.
- Claude Sonnet 4.5: Use for complex reasoning, anomaly detection, multi-step validation. Smarter, slower, pricier.
- Mix them: Use Haiku for extraction, Sonnet for validation. Best of both worlds.
Parallel Processing Works
Independent stages can run in parallel. If validation doesn't depend on categorization, run them simultaneously:
agent.once(hasExtractedData, [validateData, categorizeData]);
Both actions fire at once. Cuts processing time in half.
Cache Everything You Can
const cache = new Map();
agent.once(hasDocument, [
async (state) => {
const hash = hashDocument(state.documentContent);
if (cache.has(hash)) {
state.extractedData = cache.get(hash);
return;
}
// Extract if not cached...
},
]);
Duplicate documents? Don't waste API calls. Check the cache first.
Rate Limiting Is Your Friend
import pLimit from "p-limit";
const limit = pLimit(5); // Max 5 concurrent API calls
const batch = documents.map((doc) => limit(() => processDocument(doc)));
await Promise.all(batch);
Anthropic's API has rate limits. Respect them. Queue your requests.
Error Recovery
agent.once((state) => state.error && state.retryCount < 3, [retryWithBackoff]);
API call failed? Retry with exponential backoff. Don't lose documents because of a timeout.
Try It Yourself: Get Started in 5 Minutes
# Clone the example
git clone https://github.com/Keldrik/agentiny-invoice-processor
# Install dependencies
npm install
# Set your API key
export DEV_ANTHROPIC_API_KEY=your_key
# Run it
npm start
Or start from scratch:
npm install @agentiny/core
The GitHub repo has complete examples, including production deployment templates.
What's Next? Where This Pattern Shines
This invoice processor is just the beginning. The same reactive pattern works for:
- Contract Analysis - Extract terms, identify risks, flag unfavorable clauses
- Resume Parsing - Structure candidate data, extract skills, match to job descriptions
- Form Extraction - Pull data from PDFs, validate completeness, route to systems
- Document Classification - Route documents automatically based on content
- Compliance Checking - Flag regulatory issues, identify missing information
- Medical Records Processing - Extract patient data, validate against protocols
- Legal Document Review - Identify clauses, extract dates, check completeness
- Expense Report Automation - Validate receipts, categorize expenses, detect policy violations
Any time you have documents that need multi-step AI processing, this pattern handles it elegantly.
The Code: Everything You Need
Full working example: GitHub Repository
Key files:
invoice-processor.ts- Complete pipeline with all stagessimple-example.ts- Minimal starter (30 lines)production-example.ts- API server with queue integrationREADME.md- Full documentation and deployment guides
Conclusion: Stop Fighting Your Document Processing Code
Traditional document processing requires manual orchestration, complex error handling, and brittle step coordination. You end up with code that's hard to test, harder to change, and impossible to debug.
With reactive agents, you define when things should happen, and the framework handles how. The result? Less code, better maintainability, and a system that's easy to test, extend, and actually works in production.
Your document processing pipelines don't have to suck. Give this pattern a try.
About @agentiny: Built for Real Projects
A lightweight TypeScript framework for building reactive agent systems. I built it because I needed something simple and actually usable for client work. Zero dependencies, fully typed, production-ready.
It's under active development - I'm adding features based on real project needs. If you have ideas, the GitHub issues are open.
- GitHub: github.com/Keldrik/agentiny
- NPM: @agentiny/core
Try the example:
- Source code: GitHub
Questions or feedback? Drop a comment below or open an issue on GitHub. I actually respond to them.
Related Articles
Build an automated support ticket triage agent with Claude AI (TypeScript Guide)
Learn how to build an AI-powered support ticket triage system using TypeScript and Claude AI. Cut response times by 80%, ensure consistent routing, and save thousands—perfect for small to medium businesses.
9 min read
AI SolutionsBest LLM for Office Work 2025: ChatGPT vs Claude vs Gemini
ChatGPT, Claude, or Gemini for office work? Compare real costs ($30-100/user), task performance, and integration. Data-driven guide with benchmarks.
11 min read
AI SolutionsClaude Skills: Complete Guide for Developers
Learn what Claude Skills are, how to create custom AI agents in 15 minutes, and why developers call this bigger than MCP. Includes examples and best practices.
9 min read