AI Invoice Processing with TypeScript: Reactive Agent Tutorial

TL;DR: Stop writing spaghetti code to process invoices. Learn how to build an intelligent document processing pipeline that automatically extracts, validates, categorizes, and flags invoices - all without writing orchestration code. Using the @agentiny framework, we'll process invoices in a reactive, maintainable way. Works for invoices, receipts, contracts, resumes, and any document that makes you want to pull your hair out.

The Problem with Traditional Document Processing (Or: Why Your Invoice Code Probably Sucks)

Let me guess - you've written code like this before:

async function processInvoice(document: string) {
  try {
    const extracted = await callClaude("extract data", document);
    const validated = await callClaude("validate data", extracted);
    const categorized = await callClaude("categorize", validated);
    const anomalies = await callClaude("detect issues", categorized);
    const report = await callClaude("generate report", anomalies);
    return report;
  } catch (error) {
    // Handle errors for each step?
  }
}

If you've ever built an AI document processing system, automated invoice extraction, or tried to add intelligence to document workflows, you know this code. We've all written it. And we've all regretted it six months later when the business wants to add "just one more validation step."

This works, but it has problems:

Brittle as hell: Change one step, rewrite the whole function. Want to add duplicate detection? Good luck finding where to slot it in.
Hard to test: Each step is coupled to the next. Testing extraction means running the entire pipeline.
Error-prone: Complex error handling at each step. Did the validation fail or did the API timeout? Who knows!
Not reusable: Orchestration logic is welded to business logic. Want to use the same validation in another pipeline? Copy-paste time.
Debugging nightmare: "It broke somewhere in the middle" is not a useful error message.

The worst part? This is supposed to be the simple version. Production document intelligence systems get way gnarlier. Add parallel processing, retries, webhooks, database persistence, audit logs, and approval workflows, and suddenly you're maintaining a 500-line async function that nobody wants to touch.

A Better Way: Reactive Agents for Document Automation

What if instead of telling the computer how to process documents step-by-step, you could just define when each step should happen?

// When we have a document, extract data
agent.once(hasDocument, [extractData]);

// When we have extracted data, validate it
agent.once(hasExtractedData, [validateData]);

// When validated, categorize
agent.once(hasValidation, [categorize]);

// And so on...

Then to process a document:

agent.setState({ documentContent: invoice });
// Everything else happens automatically!

This is the reactive agent pattern, and it's what we'll build today. No more orchestration spaghetti. No more "step 7 of 12" comments. Just clean, declarative rules that fire when conditions are met.

Why I Built @agentiny (And Why You Might Need It Too)

I'm Thomas, and I run a web development business in Sydney.

I looked at existing agent frameworks. They were either:

Too heavy - Installing half of npm to process a PDF
Over-engineered - "Just read these 47 documentation pages..."
Vendor lock-in - Married to a specific AI provider or cloud platform

I needed something tiny, simple, and actually usable for real client projects. Something I could drop into a TypeScript codebase and have working in 10 minutes.

So I built @agentiny.

It has:

Zero dependencies
Type-safe - Full TypeScript support. Your IDE actually helps you.
Simple API - Triggers, conditions, actions. That's it.
Async-first - Built for modern async workflows and AI API calls.
Framework agnostic - Works with React, Vue, Node.js, Deno, Bun, whatever.

Perfect for document intelligence, workflow automation, event-driven systems, and anywhere you need AI to do things when conditions are met.

I'm actively improving it and adding features based on real project needs. If you have ideas or hit edge cases, the GitHub issues are open - I actually respond to them.

Building the Invoice Processor: From Zero to Production

Let's build a production-ready invoice processor that actually solves real problems:

Extracts structured data from invoices (no more manual data entry)
Validates the math (catches vendor mistakes)
Categorizes expenses (automatic accounting)
Detects anomalies (flags suspicious invoices)
Generates executive summaries (for people who don't read invoices)

Step 1: Define Your State

interface InvoiceState {
  documentContent?: string;
  extractedData?: {
    vendor: string;
    items: Array<{ description: string; total: number }>;
    total: number;
  };
  validationResults?: {
    mathCorrect: boolean;
    issues: string[];
  };
  category?: string;
  anomalies?: string[];
  report?: string;
}

This is your pipeline's data structure. Each stage adds more information. Think of it as a document moving through an assembly line, getting processed at each station.

Step 2: Create AI-Powered Actions

const extractDataAction = createAnthropicAction<InvoiceState>(
  { apiKey: process.env.ANTHROPIC_API_KEY! },
  {
    prompt: (state) => </span></span> <span class="line"><span style="color:#032F62">      Extract structured data from this invoice:</span></span> <span class="line"><span style="color:#032F62">      ${</span><span style="color:#24292E">state</span><span style="color:#032F62">.</span><span style="color:#24292E">documentContent</span><span style="color:#032F62">}</span></span> <span class="line"><span style="color:#032F62">      </span></span> <span class="line"><span style="color:#032F62">      Return JSON with: vendor, items[], subtotal, tax, total</span></span> <span class="line"><span style="color:#032F62">    ,
    onResponse: (response, state) => {
      state.extractedData = JSON.parse(response);
    },
  }
);

Each action is an async function that reads state, calls an AI model, and updates state. Pure, testable, reusable.

Step 3: Define Triggers (Where the Magic Happens)

// Extract when document loads
agent.once(
  (state) => !!state.documentContent && !state.extractedData,
  [extractDataAction]
);

// Validate after extraction
agent.once(
  (state) => !!state.extractedData && !state.validationResults,
  [validateDataAction]
);

// Continue the chain...

Triggers fire automatically when their conditions become true. No manual orchestration needed. Add a new step? Just add another trigger. Remove a step? Delete the trigger. The rest of the pipeline doesn't care.

Step 4: Start Processing

await agent.start();

// Feed in a document - the pipeline executes automatically
agent.setState({
  documentContent: invoiceText,
});

// Wait for completion
await agent.settle();

const result = agent.getState();
console.log(result.report); // Executive summary ready!

That's it. Drop in a document, get out structured, validated, categorized data with anomaly detection and a summary report.

The Complete Pipeline: Invoice Processing on Autopilot

Here's what happens automatically once you set documentContent:

Document loaded
    â†"
Extract structured data (vendor, items, totals)
    â†"
Validate math (check calculations)
    â†"
Categorize expense (Office, Travel, etc.)
    â†"
Detect anomalies (high amounts, errors)
    â†"
Generate report (executive summary)

Each arrow is a trigger that fires when the previous stage completes. The computer figures out the order. You just define the rules.

Sample Output: What You Actually Get

📊 INVOICE PROCESSING RESULTS

Vendor: ACME Office Supplies Inc.
Total: $222.88
Category: Office Supplies (98% confidence)

✅ Validation: All calculations correct
⚠️ Anomalies: None detected

📝 Summary:
ACME Office Supplies provided printer supplies and office
materials totaling $222.88. All calculations verified.
Standard office expense approved for processing.

Clean, structured, ready for your accounting system or database. No human had to read this invoice. No human had to type in the vendor name. No human had to double-check the math. The AI did it all.

Why This Approach Wins (And Traditional Approaches Lose)

1. Separation of Concerns

Each stage is independent. Change validation logic without touching extraction. Add a new categorization model without rewriting the anomaly detector. Your code becomes modular instead of a monolith.

2. Easy Testing

test("extraction works", () => {
  const state = { documentContent: "test invoice" };
  extractDataAction(state);
  expect(state.extractedData).toBeDefined();
});

Test each action in isolation. No mocking, no setup, no teardown. Just pure functions doing pure function things.

3. Incremental Development

Add stages one at a time. They integrate automatically.

// Later, add duplicate detection
agent.once(hasExtraction, [checkDuplicates]);

Ship the MVP with basic extraction and validation. Add categorization next sprint. Add anomaly detection when you have time. Each addition is a few lines of code, not a rewrite.

4. Clear Error Handling

const agent = new Agent({
  initialState: {},
  onError: (error) => {
    logToService(error);
    notifyTeam(error);
  },
});

Errors get caught at the framework level. No try-catch blocks scattered everywhere. One place to handle failures.

5. Observable and Debuggable

Built-in logging shows exactly what's happening:

✓ Data extracted successfully
✓ Validation completed
✓ Categorized as: Office Supplies (98% confidence)
✓ No anomalies detected
✓ Report generated

When something breaks (and it will), you know exactly which stage failed. Debugging becomes "fix the broken stage" instead of "trace through 200 lines of async functions."

Taking It to Production: Servers, APIs, and Real-World Deployment

Building a demo is easy. Running it in production where real businesses depend on it? That's different.

Here's how to connect this reactive invoice processor to actual systems:

Option 1: REST API Endpoint

Wrap your agent in an Express server and expose it as an API for document processing:

import express from "express";

const app = express();
app.use(express.json());

app.post("/api/process-invoice", async (req, res) => {
  const agent = createInvoiceAgent();
  await agent.start();

  agent.setState({ documentContent: req.body.document });

  // Wait for processing
  await agent.settle();

  const result = agent.getState();
  res.json(result);
});

app.listen(3000, () => {
  console.log("Invoice processor API running on port 3000");
});

Now you have a document intelligence API. POST an invoice, GET structured data. Simple.

Option 2: File Upload Watcher

Monitor a folder for new invoices and process them automatically:

import chokidar from "chokidar";
import fs from "fs/promises";

// Watch for new files
chokidar.watch("/uploads/invoices/*.pdf").on("add", async (filePath) => {
  console.log(New invoice detected: ${</span><span style="color:#24292E">filePath</span><span style="color:#032F62">});

  // Extract text from PDF
  const documentContent = await extractTextFromPDF(filePath);

  // Process it
  const agent = createInvoiceAgent();
  await agent.start();
  agent.setState({ documentContent });

  await agent.settle();

  // Save results
  const result = agent.getState();
  await fs.writeFile(
    filePath.replace(".pdf", "_processed.json"),
    JSON.stringify(result, null, 2)
  );
});

Drop invoices into a folder, get JSON files out. Perfect for batch processing or integrating with document management systems.

Option 3: Cloud Storage Trigger (AWS S3, Google Cloud Storage)

React to file uploads in cloud storage buckets:

// AWS Lambda example
export const handler = async (event: S3Event) => {
  const bucket = event.Records[0].s3.bucket.name;
  const key = event.Records[0].s3.object.key;

  // Download file
  const file = await s3.getObject({ Bucket: bucket, Key: key }).promise();
  const documentContent = file.Body.toString("utf-8");

  // Process
  const agent = createInvoiceAgent();
  await agent.start();
  agent.setState({ documentContent });

  await agent.settle();

  // Save to database
  const result = agent.getState();
  await saveToDatabase(result);
};

Upload an invoice to S3, trigger a Lambda function, process it automatically. Scale to thousands of invoices without thinking about servers.

Option 4: Webhook Integration

Connect to third-party services that send you documents via webhook:

app.post("/webhook/receipt-bank", async (req, res) => {
  const { documentUrl, metadata } = req.body;

  // Fetch document
  const response = await fetch(documentUrl);
  const documentContent = await response.text();

  // Process
  const agent = createInvoiceAgent();
  await agent.start();
  agent.setState({ documentContent, metadata });

  // Acknowledge immediately
  res.status(202).json({ status: "processing" });

  // Process async
  await agent.settle();
  const result = agent.getState();

  // Callback to their system
  await fetch(metadata.callbackUrl, {
    method: "POST",
    body: JSON.stringify(result),
  });
});

Integrate with accounting software, document scanning apps, email parsing services - anything that can POST JSON to an endpoint.

Option 5: Background Job Queue

For high-volume processing, use a job queue:

import Bull from "bull";

const invoiceQueue = new Bull("invoice-processing");

// Producer: Add jobs
app.post("/api/queue-invoice", async (req, res) => {
  await invoiceQueue.add({ document: req.body.document });
  res.json({ status: "queued" });
});

// Consumer: Process jobs
invoiceQueue.process(async (job) => {
  const agent = createInvoiceAgent();
  await agent.start();
  agent.setState({ documentContent: job.data.document });

  await agent.settle();

  return agent.getState();
});

Rate limiting, retry logic, concurrency control - all handled by the queue. Your agent just processes documents.

Real-World Architecture Example

Here's how I deploy this for clients:

Client's System
    ↓
  REST API (Express + TypeScript)
    ↓
  Job Queue (Bull/BullMQ)
    ↓
  Worker Process (Agentiny Agent)
    ↓
  Database (PostgreSQL)
    ↓
  Webhook back to client

Clients POST documents to the API. API queues them. Workers process them using agentiny agents. Results go to the database and webhook back to the client. Clean separation of concerns. Scales horizontally. Easy to monitor.

The reactive agent pattern works at any scale - from a simple Node script to a distributed processing cluster.

Real-World Extensions: Beyond Basic Invoice Processing

This pattern scales to complex workflows. Here are extensions I've built for actual client projects:

Add Approval Routing

agent.when((state) => state.status === "flagged", [sendToManagerForApproval]);

High-value invoices or anomalies go to a human for review. Everyone else gets processed automatically.

Add Duplicate Detection

agent.once((state) => !!state.extractedData, [checkAgainstDatabase]);

Check if you've already paid this invoice. Prevents duplicate payments (saves money, makes you look competent).

Add Multi-Document Processing

const batch = [invoice1, invoice2, invoice3];
batch.forEach((doc) => {
  const agent = createInvoiceAgent();
  agent.setState({ documentContent: doc });
});

Process hundreds of invoices in parallel. Each gets its own agent instance. No shared state, no race conditions.

Add Database Persistence

agent.when(
  (state) => !!state.report,
  [
    async (state) => {
      await db.invoices.create({
        vendor: state.extractedData.vendor,
        total: state.extractedData.total,
        category: state.category,
        report: state.report,
      });
    },
  ]
);

Automatically save processed invoices to your database. No extra code in your main pipeline.

Add Email Notifications

agent.when(
  (state) => state.anomalies && state.anomalies.length > 0,
  [
    async (state) => {
      await sendEmail({
        to: "[email protected]",
        subject: "Invoice Anomaly Detected",
        body: Issues found: ${</span><span style="color:#24292E">state</span><span style="color:#032F62">.</span><span style="color:#24292E">anomalies</span><span style="color:#032F62">.</span><span style="color:#6F42C1">join</span><span style="color:#032F62">(</span><span style="color:#032F62">", "</span><span style="color:#032F62">)</span><span style="color:#032F62">},
      });
    },
  ]
);

Alert your team when something fishy shows up. Reactive notifications without cluttering your business logic.

Performance Considerations for Production AI Pipelines

Real talk about running AI document processing at scale:

Model Selection Matters

Claude Haiku 4: Use for simple extraction tasks. Cheap, fast, good enough for 80% of documents.
Claude Sonnet 4.5: Use for complex reasoning, anomaly detection, multi-step validation. Smarter, slower, pricier.
Mix them: Use Haiku for extraction, Sonnet for validation. Best of both worlds.

Parallel Processing Works

Independent stages can run in parallel. If validation doesn't depend on categorization, run them simultaneously:

agent.once(hasExtractedData, [validateData, categorizeData]);

Both actions fire at once. Cuts processing time in half.

Cache Everything You Can

const cache = new Map();

agent.once(hasDocument, [
  async (state) => {
    const hash = hashDocument(state.documentContent);
    if (cache.has(hash)) {
      state.extractedData = cache.get(hash);
      return;
    }
    // Extract if not cached...
  },
]);

Duplicate documents? Don't waste API calls. Check the cache first.

Rate Limiting Is Your Friend

import pLimit from "p-limit";

const limit = pLimit(5); // Max 5 concurrent API calls

const batch = documents.map((doc) => limit(() => processDocument(doc)));

await Promise.all(batch);

Anthropic's API has rate limits. Respect them. Queue your requests.

Error Recovery

agent.once((state) => state.error && state.retryCount < 3, [retryWithBackoff]);

API call failed? Retry with exponential backoff. Don't lose documents because of a timeout.

Try It Yourself: Get Started in 5 Minutes

# Clone the example
git clone https://github.com/Keldrik/agentiny-invoice-processor

# Install dependencies
npm install

# Set your API key
export DEV_ANTHROPIC_API_KEY=your_key

# Run it
npm start

Or start from scratch:

npm install @agentiny/core

The GitHub repo has complete examples, including production deployment templates.

What's Next? Where This Pattern Shines

This invoice processor is just the beginning. The same reactive pattern works for:

Contract Analysis - Extract terms, identify risks, flag unfavorable clauses
Resume Parsing - Structure candidate data, extract skills, match to job descriptions
Form Extraction - Pull data from PDFs, validate completeness, route to systems
Document Classification - Route documents automatically based on content
Compliance Checking - Flag regulatory issues, identify missing information
Medical Records Processing - Extract patient data, validate against protocols
Legal Document Review - Identify clauses, extract dates, check completeness
Expense Report Automation - Validate receipts, categorize expenses, detect policy violations

Any time you have documents that need multi-step AI processing, this pattern handles it elegantly.

The Code: Everything You Need

Full working example: GitHub Repository

Key files:

invoice-processor.ts - Complete pipeline with all stages
simple-example.ts - Minimal starter (30 lines)
production-example.ts - API server with queue integration
README.md - Full documentation and deployment guides

Conclusion: Stop Fighting Your Document Processing Code

Traditional document processing requires manual orchestration, complex error handling, and brittle step coordination. You end up with code that's hard to test, harder to change, and impossible to debug.

With reactive agents, you define when things should happen, and the framework handles how. The result? Less code, better maintainability, and a system that's easy to test, extend, and actually works in production.

Your document processing pipelines don't have to suck. Give this pattern a try.

About @agentiny: Built for Real Projects

A lightweight TypeScript framework for building reactive agent systems. I built it because I needed something simple and actually usable for client work. Zero dependencies, fully typed, production-ready.

It's under active development - I'm adding features based on real project needs. If you have ideas, the GitHub issues are open.

GitHub: github.com/Keldrik/agentiny
NPM: @agentiny/core

Try the example:

Source code: GitHub

Questions or feedback? Drop a comment below or open an issue on GitHub. I actually respond to them.