Building With Claude Code: Patterns That Work and Ones That Don't
What actually holds up when building with Claude Code, with real token costs for Sonnet 4.6 and Opus 4.8, why a fat CLAUDE.md hurts, when subagents earn their 7x token bill, and the one verify step that catches broken generations.
Building with Claude Code works best when you treat the context window as a scarce budget and the agent as a junior engineer that needs a written spec. Keep CLAUDE.md lean, push verbose work into subagents, turn on prompt caching, and make the agent run its own tests. The patterns that fail quietly are the ones that cost the most.
What actually changes when you build with Claude Code?
The model stopped being the bottleneck a while ago. Claude Opus 4.8 and Sonnet 4.6 both write most of the code I need on the first pass. What breaks is everything around the model: how much context you feed it, how tightly you scope a task, whether the agent can check its own work, and how many tokens you spend getting there.
I've been shipping code through Claude Code since late 2023. The wins and the disasters come down to a small set of repeatable patterns. Here's what I keep, what I throw out, and the numbers behind each call.
Which patterns hold up when building with Claude Code?
| Pattern | Verdict | Why it lands that way |
|---|---|---|
| Lean CLAUDE.md (under ~300 lines) | Keep | Loaded every turn; bloat eats attention and tokens on every call |
| Subagents for search and verbose output | Keep | Isolates noise so the main context stays clean |
| Prompt caching on stable context | Keep | ~90% cheaper on cached input |
| Agent runs its own verify (tsc, tests) | Keep | Catches generations that look right and don't compile |
| One giant prompt with the whole task | Drop | Context fills, attention degrades, quality drops mid-task |
| Unsupervised multi-hour loops | Drop | A wrong call at step 3 compounds; you pay for every wrong turn |
| Agent picks its own model | Drop | Opus on a rename is money set on fire |
Why does a fat CLAUDE.md make the agent worse?
CLAUDE.md is the file Claude Code loads into context on every turn. That's the feature and the trap. A 1,200-line CLAUDE.md runs 12,000 to 16,000 tokens, and it rides along on every request in the session. Prompt caching makes re-reading it cheap, around 90% off the cached input rate. The attention cost does not cache. When half your context is standing instructions, the model has less room to reason about the task in front of it, and quality slips in ways that are hard to spot.
I keep mine under 300 lines. Anything longer is usually three rules doing real work and forty the model would have followed anyway. Put the stable, must-never-break invariants in CLAUDE.md. Everything situational goes in the prompt where it belongs.
When do subagents help, and when do they just burn tokens?
A subagent is a separate Claude Code session with its own context window, its own tool permissions, and a summary that returns to the main session. The isolation is the point. When I need to grep across forty files to find where a symbol is defined, I send a subagent. It reads the noise, I get the answer, and my main context never sees the forty file excerpts.
The cost is real. Each subagent carries its own context, so a delegation-heavy workflow can run roughly 7x the tokens of a single-threaded session. I don't spawn a subagent to rename a variable in one file. I spawn one for open-ended search, verbose log analysis, or any read-heavy task whose output I'd otherwise have to carry for the rest of the session.
What does building with Claude Code actually cost per run?
Here's a real session shape: a medium feature that reads context, writes a few files, and runs tests lands around 150,000 input tokens and 30,000 output tokens across the whole run. The math:
- Sonnet 4.6 ($3 / $15 per MTok): 150k input = $0.45, 30k output = $0.45. About $0.90 a run.
- Opus 4.8 ($5 / $25 per MTok): 150k input = $0.75, 30k output = $0.75. About $1.50 a run.
Caching changes the picture. On the stable 80% of input (CLAUDE.md, the file context you re-read), caching knocks about 90% off that portion. For interactive building I run Sonnet 4.6 by default and reach for Opus 4.8 only when a task genuinely needs the deeper reasoning. Letting the agent default to Opus for everything is the fastest way to triple a bill for no quality gain.
The pattern that keeps biting people: unsupervised loops
The seductive demo is the agent that runs for two hours and hands you a finished feature. In practice, small errors compound. The agent makes a wrong assumption at step 3, builds four more steps on top of it, and you pay full token price for all of it before you notice.
A loop that can't check its own work doesn't save you time. It lets a wrong assumption from minute three reach the final commit, and you pay full token price the whole way down.
I cap autonomous runs at one bounded task with a verify step, then I look. Bounded and checked beats long and confident every time.
What does a CLAUDE.md that earns its place look like?
The single highest-leverage pattern is making the agent verify itself before it claims done. Put the exact commands in CLAUDE.md so it runs them every time:
## Before you call any task done
Run these and fix what breaks. No exceptions.
- pnpm exec tsc --noEmit
- pnpm exec prettier --check .
- pnpm exec vitest run
Do not report success until all three pass.
Paste the failing output if you can't make them green.
That one section catches the 15 to 20% of generations that look right and don't compile. It's the difference between reviewing working code and debugging the agent's optimism.
The one thing to change this week
Open your CLAUDE.md and count the lines. If it's over 300, cut it to the invariants that would actually break something if violated, and move the rest into prompts. Then add a verify block with your real test command. Those two changes fix the two most expensive failure modes, degraded attention and unverified output, for free.
FAQ
How much does a Claude Code session cost?
A medium feature session runs roughly 150k input and 30k output tokens. On Sonnet 4.6 ($3 / $15 per MTok) that's about $0.90; on Opus 4.8 ($5 / $25) about $1.50. Prompt caching cuts the cached input portion by ~90%. Most people building interactively use a Max subscription and save the API rate for automated runs.
Should I use subagents for everything?
No. Subagents shine for verbose, read-heavy, self-contained work where you only need the summary. Because each one carries its own context, over-delegating can run about 7x the tokens. Use them for search and log analysis, not single-file edits.
What's the most common mistake when building with Claude Code?
Treating the context window as infinite. A bloated CLAUDE.md and a giant all-at-once prompt both fill context with noise, the model's attention degrades, and quality drops mid-task. Scope tasks small and keep standing instructions lean.
Tired of re-keying the same data between tools? Pylonworks builds custom automation and internal tools for businesses without a developer, on a fixed quote you approve up front. Tell us what's eating your time