PylonworksTell us what's eating your time
All posts

What Shipping 10 AI Side Projects Actually Cost Me

Jordan Ellis6 min read

I shipped ten AI side projects since 2023 and kept two. Here is what the other eight cost me in real dollars, the failure modes that killed them, and the spend caps, model choices, and caching that made the survivors cheap enough to leave running.

Across ten AI side projects, the median one cost $43 a month to run and held seven users. The survivors shared one trait: a hard per-task spend cap written before any agent logic. The rest died from runaway cost and silent failures the demo never showed.

What actually kills an AI side project?

The idea was rarely the problem. Of the ten I shipped between 2023 and now, nine are dead or dormant. I went back through the billing and the logs to see what actually killed them, and two patterns covered eight of the nine.

The first is cost creep. An agent that costs 6 cents per run feels free in testing. Ship it, pick up 200 runs a day, add a retry loop, and you are spending $11 a day before you look at the dashboard. One summarizer I built hit $90 over a weekend because a malformed input sent it into a retry spiral on Opus.

The second is silent failure. The agent returns something. It looks plausible. It is wrong, and nobody catches it because there is no eval gate between the model and the user.

The expensive failure is the one that returns a confident, well-formatted, wrong answer. A crash you can alert on. A plausible lie just sits there until a user finds it and leaves.

Idea quality barely moved the needle. The two projects that survived were not my best concepts. They were the ones I instrumented.

How much does running an agent loop actually cost?

The model price list is the floor. Your bill lands well above it. Here is current Anthropic pricing per million tokens as of May 2026:

Model Input ($/M) Output ($/M) Where I use it
Haiku 4.5 $1.00 $5.00 Default for most steps
Sonnet 4.6 $3.00 $15.00 Reasoning and tool routing
Opus 4.8 $5.00 $25.00 Hard multi-file code steps
Opus 4.8 Fast Mode $10.00 $50.00 Almost never

Now multiply by reality. A single agent task is rarely one call. A tool-using loop runs 4 to 12 model calls: plan, call a tool, read the result, decide, repeat. Each call resends the growing context. A 6-step task on Sonnet 4.6 with an 8K context that grows every turn can burn 60K to 120K input tokens total. That adds up fast at volume.

Retries are the quiet killer. A 3x retry on a step that already used 40K tokens triples that step's cost. Cap retries at 2 and set a per-task token budget, or one bad input eats a day of margin.

Generating this post, draft plus two passes, cost about 14 cents on Sonnet 4.6. I keep a meter on everything, including the post telling you to keep a meter on everything.

Which model should you default to?

Default down and escalate up. I start every project on Haiku 4.5 and move a specific step to Sonnet or Opus only when I can measure the cheaper model failing it.

The ladder I follow:

  1. Build the whole thing on Haiku 4.5 first.
  2. Find the one step that fails on real inputs.
  3. Promote only that step to Sonnet 4.6.
  4. Reach for Opus 4.8 only when the step needs reasoning across many files or long chains.

For 7 of my 10 projects, Haiku 4.5 handled the full workload once the prompt was tight. The jump to Opus 4.8 paid off for exactly one project that did multi-file code reasoning. Paying 5x the token rate for a step Haiku gets right is lighting money on the meter.

Why does the demo work and production break?

The demo runs on the three inputs you picked. Production runs on whatever a user pastes.

A receipt parser I built worked on every clean PDF I tested. The first real user uploaded a phone photo taken at an angle: 4MB, rotated 90 degrees, glare across the total. The vision step returned line items that summed to a number printed nowhere on the receipt. No error. Just a wrong answer delivered with full confidence. I had never added a check comparing the extracted total to the printed total, so the wrong answer shipped.

The fix had nothing to do with the model. I added a validation step that flagged the mismatch and asked the user to confirm. Build the guard before you trust the output.

What single change cut my bill the most?

Prompt caching. Cached input tokens cost 90% less than full price. If your agent resends a 4K-token system prompt and tool schema on every one of 10 calls in a loop, caching that stable prefix turns 40K full-price input tokens into 4K full plus 36K at a tenth of the rate.

{
  "model": "claude-haiku-4-5",
  "system": [
    {
      "type": "text",
      "text": "<4K tokens of stable instructions and tool schema>",
      "cache_control": { "type": "ephemeral" }
    }
  ]
}

Mark the static prefix once and every later call in the loop reads it cheap. Batch processing is the other lever: 50% off when the work can wait. Overnight summaries, backfills, and eval runs do not need to be synchronous.

How do you stop a runaway loop before it drains a hobby budget?

Three guards, written before the interesting part: a turn ceiling, a token budget, and a retry cap.

const MAX_TURNS = 8;
const MAX_TOKENS = 150_000;
let turns = 0;
let tokensUsed = 0;

while (!done) {
  if (turns++ >= MAX_TURNS) throw new Error("turn ceiling hit");
  if (tokensUsed >= MAX_TOKENS) throw new Error("token budget blown");
  const res = await callModel(ctx); // internal retries capped at 2
  tokensUsed += res.usage.input_tokens + res.usage.output_tokens;
}

This is 12 lines. It would have saved me the $90 weekend. Write it first, before the prompt, before the tools. The agent that cannot bankrupt you is the one you can leave running.


FAQ

What is the cheapest way to run a Claude agent in 2026?

Default to Haiku 4.5 at $1/$5 per million tokens, cache your static system prompt and tool schema for 90% off repeated input, and batch any work that can wait for 50% off. Most agent workloads do not need Sonnet or Opus once the prompt is tight.

How many side projects out of ten should I expect to keep?

For me it was two. Most die from running cost and unnoticed wrong answers. Idea quality matters less than the spend caps and eval gates you add on day one.

Do I need Opus for an agent, or is Sonnet enough?

Sonnet 4.6 handles most reasoning and tool routing at $3/$15. I only reach for Opus 4.8 when a step reasons across many files or long chains. Promote one step at a time and measure the failure before you pay 5x.


Tired of re-keying the same data between tools? Pylonworks builds custom automation and internal tools for businesses without a developer, on a fixed quote you approve up front. Tell us what's eating your time

Back to all posts