Build an AI SEO Agent in TypeScript with Claude
Search "AI SEO agent" right now and the first page is mostly screenshots of gumloop and n8n with arrows pointing at boxes. Useful if you want a no-code workflow. Less useful if you write code for a living.
So I built one in TypeScript instead. Around 140 lines, one dependency for the agent framework, calls Claude directly. Point it at a competitor's blog with a topic in mind, and it crawls, scores each page for relevance, and hands back the most relevant ones ranked. The full repo is on GitHub, and there's a mock mode that runs without an API key, which I'll show later.
The interesting bit isn't the crawler. It's the architecture: fetching and LLM scoring run concurrently, not sequentially. While Claude judges page N, the crawler is already pulling page N+1. Plain async/await makes that awkward. A reactive agent makes it fall out for free.
What "AI SEO agent" actually means here
The phrase covers at least three different things. People might mean a content brief generator (give me a keyword, get back an outline). They might mean an autonomous workflow that publishes posts (please don't). Or they might mean a research crawler, which is what this is.
Concrete use case: competitor content gap analysis. You're planning to write about, say, "AI coding tools." A competitor has 200 blog posts. Which of them actually cover that topic well? Reading 200 URLs by hand is not happening. Screaming Frog will tell you which pages exist, but it won't tell you which pages are about what you care about.
That's the gap. An LLM can read a page and judge relevance semantically, not just by keyword overlap. Wire that into a crawler and you get a focused crawl: pages above a relevance threshold contribute their links to the queue, pages below it are dead ends. Five minutes of compute, ranked output.
The output of one of these runs slots cleanly into a normal SEO workflow. Top three competitor pages on the topic? Read them, take notes on what they cover, find what they don't. Score distribution skewed low across the board? You've found a topic gap, write the post. Score distribution clustered tight at the top? Tough niche, you'll need a fresh angle. The crawler doesn't replace judgment, it just makes the input to your judgment manageable.
Why a reactive framework beats sequential await
I'll spend a moment on this because it's the only architectural decision in the project that actually matters.
The naive version
The obvious shape:
for (const url of frontier) {
const page = await fetchPage(url);
const { score, links } = await scoreWithClaude(page);
if (score >= 6) frontier.push(...links);
}Reads fine. Works. Slow. The problem is that fetch and Claude both block, in serial. While Claude is scoring page N (a second or two of latency), nothing is fetching page N+1. While page N+1 is being fetched, Claude is idle. Throughput is the sum of your latencies, not the max.
For ten pages, that's roughly twenty round-trips of dead time.
The reactive version
The reactive shape declares two independent triggers. One fetches when there's a URL in the queue and nothing is currently being fetched. The other scores when there's an unscored page and nothing is currently being scored. Both kick off their async work and return immediately, so the agent loop is never blocked. They run concurrently because there's nothing forcing them not to.
The framework I'm using is agentiny, which I built for exactly this kind of thing. (I wrote up a support ticket triage agent with it earlier, if you want a different shape of example.) The model is when(condition, [actions]). When the condition becomes true, the actions run. State changes re-evaluate conditions. That's it.
For this crawler, the result is real I/O concurrency. Page 4 fetches while page 3 is being scored while page 2's links are being added to the frontier. You can see it in the timestamps when you run the mock test below.
Architecture: three triggers, two mutexes
State shape
interface CrawlState {
topic: string;
origin: string;
frontier: string[];
visited: Set<string>;
pages: Page[];
fetching: boolean;
scoring: boolean;
maxPages: number;
done: boolean;
}frontier and visited are standard crawler bookkeeping. pages accumulates results. The two boolean flags act as mutexes: fetching is true when an HTTP request is in flight, scoring is true when Claude is mid-call. They prevent two of the same kind of work happening at once, which keeps the crawl polite. done is set by the stop trigger.
The triggers
┌──────────────────────────────────┐
▼ │
trigger 1: fetch ──→ trigger 2: score ───┘ (if score ≥ 6)
│ │
└──→ trigger 3: stop ◄──
(when nothing in flight, queue drained)Three triggers, each around 15 lines. Fetch picks up a URL from the frontier and runs an HTTP request. Score takes the next unscored page and asks Claude for a 1 to 10 relevance rating. Stop fires once when everything is quiet.
Why one in-flight fetch and one in-flight score, instead of full parallelism? Two reasons. Politeness: hammering a competitor's site with ten parallel requests is rude and gets you blocked. And cost: parallel Claude calls are easy to fan out, but for ten pages the total wall time is already short enough that fan-out doesn't pay back the rate-limit risk. The mutex pattern gives you something more useful, which is pipelining between two different kinds of work.
The code, walked through
Project setup
{
"type": "module",
"scripts": {
"start": "tsx --env-file=.env src/index.ts",
"mock": "tsx src/mock.ts"
},
"dependencies": {
"@agentiny/core": "^0.5.0",
"@anthropic-ai/sdk": "^0.92.0"
}
}That's the whole dependency surface. agentiny for the reactive loop, the official Anthropic SDK for Claude calls. Three source files: crawler.ts (the agent), extract.ts (a tiny HTML parser), and index.ts (CLI entry).
The fetch trigger
agent.when(
(s) => !s.done && !s.fetching && s.frontier.length > 0,
[
(s) => {
const url = s.frontier.shift()!;
s.visited.add(url);
s.fetching = true;
void fetcher(url, seed.origin)
.then((page) => {
const cur = agent.getState();
cur.pages.push(page);
cur.fetching = false;
agent.setState(cur);
})
.catch((err: Error) => {
const cur = agent.getState();
cur.fetching = false;
agent.setState(cur);
});
},
],
);The shape is the important bit. The action does its synchronous work (pop the URL, flip the mutex) and then kicks off the HTTP request as fire-and-forget. The .then mutates state in place and calls agent.setState(cur) to wake the loop. Notice it's the same state object, not a spread. That's deliberate, more on that below.
The score trigger
agent.when(
(s) => !s.done && !s.scoring && s.pages.some((p) => p.score === undefined),
[
(s) => {
const page = s.pages.find((p) => p.score === undefined)!;
s.scoring = true;
void scorer(s.topic, page).then(({ score, reason }) => {
const cur = agent.getState();
const target = cur.pages.find((p) => p.url === page.url);
if (!target) return;
target.score = score;
target.reason = reason;
cur.scoring = false;
if (score >= 6) {
const fresh = page.links.filter((l) => !cur.visited.has(l) && !cur.frontier.includes(l));
cur.frontier.push(...fresh);
}
agent.setState(cur);
});
},
],
);Same shape. The interesting line is the threshold check. If Claude scores the page 6 or higher, its links go onto the frontier. Otherwise the subtree is pruned. That's what makes this a focused crawl rather than a generic one.
A real concurrency bug bit me here while building this. An earlier version used a helper that captured the state reference at action entry, then awaited the LLM call. If another fire-and-forget callback called setState({...cur}) during the await, the captured reference was stale and the mutations got lost. The fix is the mutate-don't-spread pattern you see above: same state reference, mutated in place, setState(cur) only as a signal to wake the loop. Worth knowing if you go reactive with concurrent triggers.
The stop trigger and the runner
agent.once(
(s) =>
!s.done &&
!s.fetching &&
!s.scoring &&
s.pages.every((p) => p.score !== undefined) &&
(s.visited.size >= s.maxPages || s.frontier.length === 0),
[
(s) => {
s.done = true;
agent.setState(s); // important: notifies subscribers
},
],
);The runner uses subscribe rather than agentiny's settle() helper. settle() waits for quiet polling cycles, which doesn't fit fire-and-forget patterns: there can be long quiet gaps while the network and Claude are both busy, and settle resolves prematurely. Subscribing on done and waiting for it to flip is the right signal here.
await new Promise<void>((resolve) => {
const unsub = agent.subscribe((s) => {
if (s.done) {
unsub();
resolve();
}
});
});Running it
Real run, against any site you have permission to crawl:
$ npm start "https://competitor.com" "AI coding tools"
[fetch] https://competitor.com/ → Home
[score] https://competitor.com/ → 7/10 +12 links
[fetch] https://competitor.com/blog → Blog
[fetch] https://competitor.com/blog/claude-review → ...
[score] https://competitor.com/blog → 9/10 +8 links
[score] https://competitor.com/blog/claude-review → 10/10 +3 links
...
[done] 10 pages crawledThe interesting part is in the timing. Here's an actual extract from the mock test, with millisecond timestamps:
1ms → fetch start /
42ms ← fetch done /
42ms → score start /
149ms ← score done / = 8 +3 links
149ms → fetch start /posts
188ms ← fetch done /posts
188ms → fetch start /about
188ms → score start /posts
220ms ← fetch done /about
220ms → fetch start /contact
270ms ← fetch done /contact
288ms ← score done /posts = 8 +2 linksLook at the 188ms mark. The fetcher just finished /posts and is already onto /about, while the scorer is also working on /posts. By 220ms the fetcher is done with /about and started on /contact, all while the scorer is still chewing on /posts. Three URLs fetched in the time it took the LLM to score one. That's the agentiny payoff in one frame. With sequential await, that 100ms scoring window would have been pure idle time.
Speaking of mocks: the crawler accepts injected fetcher and scorer functions, so you can run it with no API key at all.
npm run mockThis drives the agent against a fake in-memory site of ten pages, with a fake scorer that marks half of them relevant. Useful for CI, useful for trying the project before committing to a key, and useful for debugging the trigger logic without burning tokens. The injection point is just a config option, so the production code path is unchanged.
Where this falls short in production
The repo is a working tutorial, not a production crawler. Things you'd want before pointing it at the open web:
- No
robots.txtcompliance. Add one before crawling sites you don't own. (This bit is non-negotiable.) - The HTML extractor is regex-based, which is exactly as fragile as it sounds. Swap in
linkedomorcheerio. - One in-flight request and one in-flight score is conservative. With proper rate limiting and a per-host concurrency cap, you could fan out further.
- No persistence. A crash at page 47 of 100 means starting over.
- Cost: the demo uses claude-haiku-4-5, which is cheap. Don't reach for sonnet here unless the relevance judgements are visibly wrong. The smaller model handles this kind of binary classification fine.
None of these are big lifts. The reactive structure makes them easy to slot in: add a robots check inside the fetch action, wire the frontier through SQLite, add a third trigger for retries.
Wrap-up
That's the whole thing. About 140 lines, real I/O concurrency, mock mode that needs no key, and an honest list of what would break if you took it to production.
Three obvious extensions if you want to keep going. Persist the frontier and visited set in SQLite, so the crawler is resumable. Add a fourth trigger that takes the top-N scored pages and writes a content brief, turning research into draft. Or expose the whole thing as an MCP server, so Claude itself can call it during a chat.
Related Articles
Build an automated support ticket triage agent with Claude AI (TypeScript Guide)
Learn how to build an AI-powered support ticket triage system using TypeScript and Claude AI. Cut response times by 80%, ensure consistent routing, and save thousands—perfect for small to medium businesses.
9 min read
TutorialsAI Invoice Processing with TypeScript: Reactive Agent Tutorial
Build an intelligent invoice processor using TypeScript and reactive agents. Automate document extraction, validation, and categorization without orchestration code. Production-ready tutorial.
14 min read
TutorialsHow to Build an MCP Server with TypeScript
Build an MCP server with TypeScript using the official SDK. Covers tools, resources, error handling, and production deployment for AI integrations.
8 min read