CLAUDE.md: Helpful or Just Expensive Noise?

If you've used Claude Code for more than a week, you've probably been told you need a CLAUDE.md file. Run /init, let it generate one, commit it, done. It's the first piece of advice in every Claude Code tutorial and the first thing people ask about in every forum. And for months, I just accepted the premise. Of course you need a claude.md file. It's how you tell Claude Code what to do.

Except... does it actually work? After months of using Claude Code daily, watching Claude cheerfully ignore instructions I explicitly wrote, and then stumbling across the first academic research that actually measured this stuff, I'm not so sure the conventional wisdom holds up. The truth is more interesting—and more useful—than "always use one" or "never bother."

What CLAUDE.md Actually Is (and Isn't)

Let's get the basics out of the way. A CLAUDE.md is a persistent markdown file that Claude Code reads at the start of every session. It sits in a hierarchy: enterprise policy files at the top, then user-level (~/.claude/CLAUDE.md), project-level (./CLAUDE.md), and local files (./CLAUDE.local.md). You can generate a starter one with /init, which scans your codebase and produces something reasonable-looking.

Here's the part most people miss, though. Anthropic's own documentation says it plainly: CLAUDE.md is context, not enforcement. Claude reads it and tries to follow it, but there's no guarantee of strict compliance. And it gets worse. Third-party analysis from HumanLayer found that Claude Code's system prompt wraps your CLAUDE.md content with a reminder that says, essentially, "this context may or may not be relevant to your tasks." It's literally given permission to ignore you.

That reframing changes everything. And it applies equally to AGENTS.md, Codex's equivalent, and any other agent configuration file. You're not writing laws. You're writing suggestions.

The Research Says It Might Be Hurting You

The ETH Zurich Study

In February 2026, researchers from ETH Zurich published the first rigorous empirical evaluation of whether repository context files actually improve coding agent performance. They tested four agents—Claude Code, Codex, and Qwen Code—across 300 SWE-bench Lite tasks and a new 138-task benchmark called AGENTbench.

The results were not what the community expected. LLM-generated context files—the kind /init produces—decreased success rates and increased costs by around 20%. Human-written files did better, showing roughly a 4% improvement on AGENTbench. But here's the kicker: Claude Code was the only agent where even developer-written files failed to improve performance compared to having no file at all.

The most revealing part was an ablation study. When the researchers stripped all existing documentation from repositories—READMEs, docs folders, examples—context files suddenly helped, producing a consistent 2.7% improvement. The implication is clear: CLAUDE.md files are largely redundant with documentation that already exists. They help most where they're the only structured knowledge available.

One finding I found genuinely interesting: agents were highly compliant with tool-specific instructions. When a context file mentioned the uv package manager, agents used it 160 times more often. Context files clearly shape behaviour. Whether that shaping improves outcomes is a different question.

The Compliance Decay Problem

The other elephant in the room is that Claude forgets. Or more accurately, it deprioritises.

Developer Siddhant Khare documented a predictable compliance decay curve: 95%+ compliance at messages 1-2, dropping to 60-80% by messages 3-5, and falling to 20-60% by messages 6-10. Beyond ten messages, original instructions are mostly gone. This isn't a CLAUDE.md bug—it's a fundamental limitation of instruction-following in large language models.

Here's where it gets uncomfortable. HumanLayer's analysis estimates that frontier LLMs can follow roughly 150-200 instructions with reasonable consistency. Claude Code's own system prompt already contains around 50 instructions. That's nearly a third of the budget consumed before your CLAUDE.md even loads. Every line you add doesn't just compete with your other CLAUDE.md rules—it competes with Claude Code's core behavioural programming. A bloated 300-line CLAUDE.md can actually make Claude worse at following its own built-in instructions for tool use, file management, and code generation.

My Experience After Months of Using It

I'll be honest: I still use CLAUDE.md. But my relationship with it has changed.

My workflow now goes like this: I run /init, read what it generates, then delete most of it. The auto-generated file is a decent starting point for understanding what Claude thinks your project looks like, but committing it unreviewed is a mistake—the research literally shows it underperforms having no file at all. So I treat /init output as reconnaissance, not configuration.

The biggest shift was accepting the "keep it short" principle. Not just for prompts, but for all context you feed an LLM. The less noise Claude has to parse, the more reliably it follows what matters. This matches both the research and Anthropic's own recommendation to target under 200 lines per file. My files tend to land well under that.

I've also watched Claude ignore instructions I explicitly wrote. The formatting rule it followed perfectly for three messages, then abandoned. The test command it ran correctly twice, then forgot existed. It's not malicious—it's the compliance decay curve playing out exactly as documented. One of the more entertaining GitHub issues has Claude itself explaining the problem: "I have two competing modes: Default Mode and CLAUDE.md Mode. My default mode always wins because it requires less cognitive effort." At least it's self-aware about it.

For quick tasks—fixing a single file, answering a question about the codebase, simple refactors—I often skip the CLAUDE.md entirely. And you know what? It usually doesn't matter. The overhead of loading context that isn't relevant to a five-minute task isn't worth the cost, both in tokens and in cognitive budget.

The pattern that actually works is failure-driven iteration. Don't try to write the perfect file upfront. Add rules when Claude fails, remove them when they're redundant. This aligns with how Boris Cherny, who created Claude Code at Anthropic, runs his team's file—roughly 60-80 lines, updated collaboratively through real mistakes. Add a rule when something goes wrong. Tag your colleagues' PRs to capture learnings. Prune regularly.

What Actually Belongs in a CLAUDE.md

What to Include

The signal-to-noise ratio is everything. Your CLAUDE.md should contain things Claude genuinely cannot discover or infer from your codebase:

Non-obvious build, test, and lint commands with exact flags. If your test runner needs --no-cache --forceExit, say so. Claude won't guess that. Architectural decisions that contradict what the code structure might suggest. If you're using a monorepo pattern where packages import from each other in a specific order, that's worth documenting. Team conventions that go against common patterns. If your team uses bun instead of npm, or prefers a specific error handling pattern that isn't the obvious one, mention it. Brief pointers to deeper documentation. Instead of cramming everything in, reference where Claude can find more: "For API conventions, see docs/api-guide.md." This progressive disclosure pattern keeps your CLAUDE.md lean while making detailed knowledge available on demand.

What to Leave Out

Personality instructions like "Be a senior engineer" or "Think step by step." These waste instruction budget on things that don't improve output quality. Generic advice like "Write clean code" or "Follow best practices." If it's not specific enough to verify, it's not specific enough to include. Comprehensive style guides. This is the anti-pattern HumanLayer calls out specifically: never send an LLM to do a linter's job. LLMs are expensive and slow at enforcing code style compared to running Prettier or ESLint in a hook. Directory trees and codebase overviews. The ETH Zurich study confirmed what you'd expect—agents can discover project structure themselves. Telling Claude what files exist in your repo is pure noise.

Use Hooks for Anything That Actually Matters

This is the single most important insight I've landed on: CLAUDE.md rules are requests. Hooks are laws.

If a rule can't be broken—formatting, running tests before commit, type validation—enforce it deterministically with a Claude Code hook, not hopefully with a markdown instruction. A hook runs actual code at specific lifecycle points. It doesn't forget. It doesn't deprioritise. It doesn't have competing cognitive modes.

The linter anti-pattern is the clearest example. Putting "Use 2-space indentation and trailing commas" in your CLAUDE.md means Claude has to remember and apply that rule on every edit, burning instruction budget and still getting it wrong sometimes. Putting prettier --write in a post-edit hook means it happens every single time, instantly, and costs nothing in context. The choice is obvious.

Think of it this way: CLAUDE.md is guidance for flexible decisions. Hooks are enforcement for non-negotiable rules. If you'd reject a PR for violating it, it belongs in a hook, not a markdown file.

A Practical CLAUDE.md Playbook

When It's Worth It

CLAUDE.md provides the most value in poorly documented repositories—the research backs this up directly. If your README is sparse and there's no docs folder, a lean CLAUDE.md is genuinely the biggest improvement you can make. It's also worth the investment for multi-file workflow tasks that require understanding project conventions, team environments where you want shared standards across developers using Claude Code, and monorepos with directory-scoped files guiding Claude through complex project structures.

When to Skip It

Well-documented repos with good READMEs and thorough docs. Your CLAUDE.md will mostly duplicate what already exists, and the ETH Zurich study showed that redundancy doesn't help—it hurts. Quick isolated tasks like single-file edits or quick questions don't need project context. And critically: if you're just going to commit the /init output unreviewed, you're better off having no file at all. The research is unambiguous on this point.

The 60-Line Rule

Aim for under 80 lines. The sweet spot from community benchmarks, the research, and Anthropic's own internal usage converges around there. Boris Cherny's team runs about 60-80 lines. HumanLayer's production file is under 60.

A 60-line CLAUDE.md that Claude actually follows beats a 300-line one it mostly ignores—and costs 20% less per task. That's not a philosophy. That's arithmetic.

The Bottom Line

CLAUDE.md occupies an uncomfortable middle ground: too useful to abandon entirely, too unreliable to trust unconditionally. The academic evidence says context files are mostly redundant overhead for well-documented projects. The community says they're indispensable once properly tuned. Both are right—they're measuring different things.

The highest-leverage insight from all of this is that CLAUDE.md is not an instruction file. It's a context file. Claude treats it as background information, not binding rules. The system literally wraps it with permission to ignore irrelevant content. Once you internalise that distinction, the optimal strategy becomes clear: use CLAUDE.md for the 20% of project knowledge that saves the most repeated explanation, enforce critical rules through hooks rather than hopeful instructions, and resist the temptation to keep adding lines.

Keep it short. Keep it specific. Delete anything Claude already knows. And for anything that truly can't be broken—use a hook.