How Much Does It Cost to Run an AI Agent? A Per-Run Breakdown
The honest answer to AI agent costs: roughly $200 to $8,000 a month, but the real number hides in cost per run, not the model price.
It depends, and anyone who gives you a single number is selling something. A realistic range: a small, single-purpose agent runs somewhere around $200 to $1,000 a month, and a framework-based agent handling real volume lands between $1,500 and $8,000 a month. But that monthly figure is the wrong unit. The number that actually predicts your bill is cost per run, and per-run cost on an agent swings wildly. One run might cost you five cents. The next, on the same agent, with a near-identical prompt, costs two dollars. That spread is the whole story, and it is the part most teams discover only after the invoice arrives.
This is the difference between an agent and a chatbot. A chatbot request is roughly fixed: one prompt in, one completion out, a predictable token count. An agent is a loop. It reasons, calls a tool, reads the result, reasons again, maybe retries a failed call, maybe spawns a sub-task. Every one of those steps is a billable model call. The loop length is not something you set. The model decides it, at runtime, based on how hard the task turned out to be.
What does “cost to run an AI agent” actually mean?
There are three separate bills, and people conflate them constantly.
The first is token spend: what the model provider charges for input and output tokens. This is the line item everyone fixates on, and it is usually not the one that hurts.
The second is infrastructure: the server or serverless platform the agent loop runs on, the vector database for retrieval, the queue, the orchestration layer. A framework-based agent built on something like LangChain or CrewAI carries this weight whether it is busy or idle.
The third is observability and ops: logging every step, tracing every tool call, storing the traces, and the human time spent reading them when a run goes sideways. Teams budget for tokens and forget that watching the agent costs real money too. At volume, the observability bill can rival the token bill.
Add them up and you get the monthly figure. But the monthly figure is an average of a distribution, and the distribution has a long tail.
Why is the cost per run so unpredictable?
Because the model controls the loop, not you.
Give an agent a simple task and it might make one model call, one tool call, and finish. Give it an ambiguous task and it might reason for four steps, call three tools, get a malformed result, retry twice, then reason for two more steps before answering. Same agent. Same code. The second run cost maybe fifteen times the first.
Three things drive the spread:
- Reasoning depth. Harder inputs trigger longer chains of thought, and on reasoning-capable models that thinking is billed as output tokens. A run that needs to plan costs more than a run that can answer directly.
- Tool calls. Each round trip to a tool means another model call to interpret the result. An agent that searches, reads, and cross-checks burns more calls than one that answers from context.
- Retries. A failed tool call, a timeout, a malformed response, a rate limit. Every retry replays context and re-bills it. Retries are where “cheap” runs quietly become expensive ones.
None of these are knobs you turn before the run. They emerge from the input. That is why a per-run cost ceiling is so hard to set, and why “average cost per run times expected runs” is a forecast, not a guarantee.
How much do the tokens themselves cost?
Per-token pricing is the floor, and it is more knowable than the rest. Anthropic publishes current rates for Claude, verified as of this writing:
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| Claude Opus 4.8 | $5.00 | $25.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Claude Haiku 4.5 | $1.00 | $5.00 |
Other providers sit in similar bands. Frontier models from the major labs cluster in the low single digits per million input tokens and higher per million output tokens, with smaller and faster models running roughly a third to a fifth of that. Those are illustrative ranges from public provider pricing, and they shift, so always check the current rate before you model your costs.
Translate that to a single interaction and you get the figure industry reports keep landing on: roughly $0.015 to $0.12 per typical interaction, depending on model choice, prompt length, and how much the model writes back. That sounds trivial. The trap is the word “interaction.” An agent run is not one interaction. It is a chain of them. Multiply $0.015 to $0.12 by the number of steps in a reasoning loop, and a $0.05 run and a $2 run are both perfectly ordinary.
Notice the output column. On every model, output costs several times more than input. Agents that generate long plans, write code, or produce verbose tool arguments pay disproportionately on the expensive side of the ledger. If your per-run cost is creeping up, the output tokens are usually the reason.
What does a real monthly bill look like by deployment type?
Here is where the ranges live. These are realistic bands drawn from reported agent deployments, not quotes for any specific stack.
| Deployment type | Typical monthly cost | Cost per run | What drives it |
|---|---|---|---|
| Single-purpose agent (low volume) | $200 to $1,000 | $0.05 to $0.50 | Mostly tokens; thin infra |
| Single-purpose agent (steady volume) | $800 to $2,500 | $0.10 to $0.80 | Tokens plus a real observability bill |
| Framework-based agent (LangChain, CrewAI-style) at volume | $1,500 to $8,000 | $0.20 to $2.00+ | Tokens, infra, retries, multi-tool loops |
A few things to read out of this table.
The single-purpose agent is cheap because its loop is short and its tool surface is small. It answers a narrow question, calls one or two tools, and stops. Cost per run stays tight because the variance is low.
The framework-based agent is expensive because flexibility costs money. More tools means more round trips. More autonomy means longer loops. The frameworks that make agents easy to build also make them easy to make slow and chatty, and chatty agents are expensive agents. The top of that $8,000 band is almost always a retry problem or a tool-call sprawl problem, not a token-price problem.
Where do agent bills actually leak?
The leaks are rarely the model price. They are structural.
Retry storms. A flaky downstream API fails, the agent retries, each retry replays the full context, and a run that should have cost $0.20 costs $1.50. At scale, a 5% retry rate is a 5% larger bill before you account for the context replay, which makes it worse.
Context bloat. Agents accumulate history. Every turn re-sends the conversation, the tool definitions, the system prompt. Without prompt caching or context trimming, input tokens grow turn over turn, and you pay full freight to re-read the same bytes. This is the single most common silent leak, and it is fixable.
Verbose tool arguments. Models sometimes generate enormous, over-specified tool calls. Output tokens, billed at the high rate, for arguments a tighter schema would have kept short.
Observability you forgot to budget. Tracing every step of every run produces a lot of data. Storing it, indexing it, and querying it is a line item. Teams that skip it save money and lose the ability to debug the expensive runs, which is exactly when they need the traces.
The idle infrastructure floor. A framework agent on always-on infrastructure pays rent whether it runs once a day or once a second. Low-volume agents on heavy infra have a terrible cost per run because the fixed cost divides across too few runs.
How do you actually control the cost?
You attack the loop, not the price.
Cap the loop. Set a hard maximum on reasoning steps and tool calls per run. This turns the worst-case run from unbounded into bounded, which is the single biggest lever on the long tail.
Route by difficulty. Send easy runs to a cheaper, faster model and reserve the expensive model for runs that genuinely need it. A Haiku-class model handling the simple 80% and an Opus-class model handling the hard 20% beats running everything through the top tier.
Cache the stable prefix. Prompt caching cuts the cost of re-reading the system prompt and tool definitions on every turn to roughly a tenth of the input price. For multi-turn agents this is not optional; it is the difference between a sane bill and a bloated one.
Trim context. Drop stale tool results and old turns once they stop being relevant. Shorter input, lower bill, often better behavior.
Measure cost per run, not cost per month. Watch the distribution. The mean tells you the budget. The tail tells you where the money is actually going.
FAQ
How much does a simple AI agent cost per month to run?
$200 to $1,000 a month for a single-purpose, low-to-moderate-volume setup. The ceiling rises fast once you add real observability — storing and indexing traces is a line item most teams omit from their initial estimate, and at a few thousand runs a day it stops being rounding error.
Why is agent cost so unpredictable compared to a chatbot?
The model controls the loop length, not you. Easy inputs can resolve in a single tool call; ambiguous ones can cascade into retries and multi-step reasoning that cost fifteen times more — with nothing in the prompt that signals which kind you are sending. That variance is structural, not a tuning problem, which is why cost per run follows a distribution with a long tail rather than clustering around a mean.
What is the biggest hidden cost in running an AI agent?
Context bloat is the quietest one. Each turn re-sends the full system prompt, tool definitions, and prior history, so input tokens compound across a session. Prompt caching cuts that to roughly a tenth of standard input pricing; without it, multi-turn agents often pay more in repeated context than in new reasoning. Retry storms are the loudest: one flaky downstream call can multiply a run’s cost several times before the agent gives up.