What is the difference between a soft limit and a hard budget for an AI agent?

A soft limit lives in application code — the agent is instructed not to spend more than $X. It can be bypassed by bugs, misinterpretation, or prompt injection. A hard budget is enforced by infrastructure: a card balance or IOU ceiling that the agent cannot exceed regardless of its behavior. Only hard budgets reliably contain overspending.

How much should I budget for an AI agent task?

Size the budget to the task, not to comfort. Estimate the tool calls required (web searches, image generations, API calls), multiply by the per-call cost, and add a 10–20% buffer for retries. A research task requiring 20 web searches costs roughly $0.10. A $50 budget for that task is a 500x overestimate — and a 500x larger blast radius if something goes wrong.

What are per-category spending limits for AI agents?

Per-category limits cap spending by tool type — web search, image generation, email — independently of the total balance. An agent might have a $5 total balance but a $1 cap on image generation specifically, preventing a runaway image loop from consuming the entire budget.

Can I give different agents different budgets?

Yes, and you should. Each agent should have its own isolated account with a budget sized to its specific task. A researcher and a buyer should not share a balance — if the buyer's payment logic misbehaves, the researcher's budget should be unaffected. Per-agent isolation is a core financial zero trust principle.

What happens when an agent's budget runs out?

With a structural ceiling (IOU balance or card), the agent stops — the next tool call is declined. With a soft limit, the behavior depends on whether the agent's code correctly implements the limit. This is why structural ceilings are preferred: the outcome is deterministic regardless of agent behavior.

How to Give an AI Agent a Budget

The most common approach to agent budgeting: give the agent a card with a $500 limit, assume it won’t spend more than it needs to, and monitor the transaction log occasionally. This works until it doesn’t — and when it doesn’t, the failure modes are expensive.

The right approach is different. The budget is a design decision, not a configuration detail. It gets made before the agent runs, sized to the actual task, and enforced by infrastructure the agent can’t influence.

ATXP robot holding a wallet with a balance meter showing task budget and hard ceiling line

What “giving an agent a budget” actually means

Giving an agent a budget means setting a structural spending ceiling before the agent runs. Not a soft instruction (“don’t spend more than $10”), but a hard limit enforced outside the agent’s own logic.

The distinction matters because agents fail. Retry loops, misinterpreted instructions, prompt injection — any of these can cause an agent to spend more than intended. A soft limit embedded in application code can be bypassed by the same failure that caused the overspend. A structural ceiling can’t be.

Definition — Structural Spending Ceiling

A structural spending ceiling is a financial limit enforced by infrastructure rather than by the agent's behavior. A card balance and an IOU token balance are both structural ceilings: when the balance reaches zero, the agent stops regardless of what it's doing or why. The ceiling is external to the agent's code and cannot be bypassed by bugs, misinterpretation, or prompt injection.

— ATXP

Sizing the budget to the task

The most common budgeting mistake isn’t setting too low a limit — it’s setting too high a one. A $100 budget for a task that costs $0.30 doesn’t feel like a risk, but it means the worst-case financial damage from that agent going wrong is $100, not $0.30.

The correct approach: estimate the task’s actual cost, add a buffer for retries, and set the budget there.

Agent task	Typical tool calls	Estimated cost	Appropriate budget
Research task (20 web searches)	20 × `web_search`	~$0.08	$0.15
Competitive analysis (browse 5 pages)	5 × `web_browse`	~$0.10	$0.20
Image generation batch (10 images)	10 × `image_generate`	~$0.40	$0.60
Email campaign (50 sends)	50 × `email_send`	~$0.10	$0.20
Full research + report + email	Mixed	~$0.25	$0.40

The buffer exists for retries and edge cases, not for comfort. A 50% buffer is reasonable. A 10x buffer is not — it’s an uncontrolled blast radius.

"I was shocked how cheap it actually is once you're routing efficiently. The agents that seemed expensive were the ones with unnecessary overhead, not the ones doing a lot of work."

Louis Amira, co-founder, Circuit & Chisel

Most tasks are cheaper than they look. The budgeting instinct is to overestimate because it feels safer — but the overestimate is the risk.

How to implement the budget

Two models for structural ceilings, used in combination:

IOU token balance — for tool calls (web search, image generation, email, code execution). Fund an account with the task budget. Each tool call deducts automatically. Balance hits zero: agent stops.

# Fund for a specific task
npx atxp fund --agent "researcher" --amount 0.20

# Set per-category limits within that balance
npx atxp limits --agent "researcher" --web-search 0.15 --image-gen 0.05

# Check balance before running
npx atxp balance --agent "researcher"

Virtual card — for third-party merchant purchases requiring a real card number. Load with the specific purchase amount. One card per task, revoked when done.

For most agent stacks: IOU tokens for all tool infrastructure, virtual cards only when a merchant requires a card number. The overhead and economics of cards don’t work for sub-dollar tool calls.

Agent budget sizing — three agent roles with balance meters showing task-appropriate budget levels

Per-category limits

A total balance controls how much the agent can spend overall. Per-category limits control how it can spend within that balance. Both are useful; they solve different problems.

Limit type	What it prevents	Example
Total balance	Any overspend beyond task budget	$0.20 balance for a $0.15 task
Per-category	Runaway usage of one expensive tool	$0.05 cap on image generation
Per-agent isolation	One agent affecting another’s budget	Separate accounts per agent

An agent with a $1 total balance but no category limits could theoretically spend all $1 on image generation — 25 images when 2 were needed. Category limits prevent that without reducing the total budget.

Common mistakes

Setting the budget to a round number. $10, $50, $100 — these numbers are arbitrary. They feel safe because they’re familiar amounts, not because they match the task. A $10 budget for a $0.10 task means the worst case is $10.

Sharing a balance across agents. If a buyer agent and a researcher share a balance, the buyer’s retry loop can exhaust the researcher’s budget. Per-agent isolation means each agent’s worst case is bounded to its own ceiling.

Treating the balance as approximate. “I funded it with $5 and the task cost $0.40 — fine.” The $4.60 that didn’t get used is $4.60 of blast radius sitting in that account. Refund or reset after each task run.

Setting limits in application code instead of infrastructure. Code-level limits can be bypassed. Infrastructure-level limits cannot. If your budget enforcement lives in a conditional in the agent’s task loop, it’s a soft limit — which means it can fail exactly when you need it most.

npx atxp

Pre-funded accounts. Per-agent isolation. Per-category limits. Financial zero trust → · Spending limits → · Horror stories →

Frequently asked questions

How do I give an AI agent a budget?

Fund a pre-funded account (IOU balance) with the amount the task requires. The balance is the structural ceiling — when it hits zero, the agent stops. Set it before the agent runs, sized to the task cost plus a 10–20% retry buffer.

What’s the difference between a soft limit and a hard budget?

Soft limits live in application code and can be bypassed by bugs or misinterpretation. Hard budgets are enforced by infrastructure (IOU balance, card balance) and cannot be bypassed by the agent regardless of its behavior.

How much should I budget for a task?

Estimate the tool calls required, multiply by per-call cost, add 10–20% for retries. Most tasks cost under $0.50. A research task with 20 web searches costs roughly $0.10. Size to that — not to a comfortable round number.

Can I set limits by tool type?

Yes. npx atxp limits --web-search 0.15 --image-gen 0.05 sets per-category caps within the total balance. Useful when one tool type could consume a disproportionate share of the budget.

Should each agent have its own budget?

Yes. Per-agent isolated accounts mean one agent’s overspend can’t affect another’s. Financial zero trust →

What happens when the budget runs out?

The next tool call is declined. The agent stops. You see the balance hit zero in the transaction log and can investigate, refund, and re-run. What happens when the card is declined →