Agent Spending Analytics: What to Track and Why

“How much did my agents spend last month?” is the wrong question.

The right questions are: what did each agent spend per completed task, which agents are becoming more expensive over time, and is there an agent that spent $50 on a task that should cost $0.50?

That’s the difference between cost reporting and agent spending analytics.

Why Basic Cost Tracking Isn’t Enough

Raw spend totals tell you whether to be alarmed or not alarmed. They don’t tell you:

  • Whether your agents are getting more or less efficient over time
  • Which users are driving disproportionate costs in a multi-tenant app
  • Which model selection is costing you more without better results
  • When an agent is looping or misbehaving (often visible as a cost spike before you notice the behavior)

Agents that handle more tasks should cost more. Agents that handle the same tasks at increasing cost are a problem. Those look identical in raw spend reports.

The Metrics That Matter

1. Cost Per Completed Task

What it is: Total agent spend divided by number of tasks completed in a time window.

Why it matters: Drift in cost-per-task usually signals a model regression, prompt degradation, or tool failure that’s causing the agent to retry more.

How to track it: Log task start/end events alongside ATXP transaction data. Match spend to task IDs.

# Example cost-per-task calculation
def cost_per_task(agent_id: str, task_ids: list[str]) -> dict:
    txns = get_transactions(agent_id)
    tasks_completed = count_completed_tasks(task_ids)
    total_spend = sum(t["cost_usd"] for t in txns)
    return {
        "total_spend": total_spend,
        "tasks_completed": tasks_completed,
        "cost_per_task": total_spend / tasks_completed if tasks_completed else 0
    }

2. Spend Rate (Per Hour/Day)

What it is: Rolling spend per unit time for each agent.

Why it matters: Anomalous spend rate is often the first signal of a misbehaving agent — before you notice wrong outputs or user complaints.

Threshold to alert: When observed spend rate exceeds 2-3x the trailing 7-day average for that agent.

3. Model Cost Breakdown

What it is: Split of costs by model and by input vs. output tokens.

Why it matters: Output tokens are 4-5x more expensive than input tokens for most models. Agents that generate verbose reasoning chains before answering have a different cost profile than terse, direct agents. Understanding this split helps prompt optimization.

4. Cost Per User (Multi-Tenant)

What it is: In apps where multiple users run agents, total spend attributable to each user.

Why it matters: In any distribution of users, a small percentage typically drive a disproportionate share of costs. Understanding which users are high-cost lets you enforce usage limits, design pricing tiers, or have direct conversations.

Implementation: One ATXP agent account per user. User cost = sum of that account’s transactions.

5. Budget Utilization Rate

What it is: What percentage of each agent’s allocated budget is being consumed per period.

Why it matters: Agents consistently hitting 90%+ of their budget need their budgets adjusted or their efficiency improved. Agents using 5% of their budget might have budgets that are too large, which limits your ability to detect anomalies.

Anomaly Detection: The Practical Approach

Most agent cost anomalies follow one of three patterns:

Spike: Agent spends 10x its normal rate in a short window. Cause: loop, retries, unexpected prompt, tool failure causing repeated calls.

Gradual drift: Cost-per-task increases slowly over days/weeks. Cause: prompt changes that made the agent more verbose, model updates, accumulated conversation context bloating token counts.

Per-user outlier: One user’s agent costs 20x the median. Cause: unusual use patterns, adversarial inputs, or a genuine heavy user.

Simple alerting for each:

def check_spend_anomalies(agent_id: str):
    recent = get_spend_last_hour(agent_id)
    baseline = get_average_hourly_spend_last_7_days(agent_id)

    if baseline > 0 and recent > baseline * 3:
        alert(f"Agent {agent_id} spend spike: ${recent:.4f}/hr vs ${baseline:.4f}/hr baseline")

    # Gradual drift check
    this_week = get_cost_per_task_this_week(agent_id)
    last_week = get_cost_per_task_last_week(agent_id)

    if last_week > 0 and this_week > last_week * 1.5:
        alert(f"Agent {agent_id} cost-per-task up 50%: ${this_week:.4f} vs ${last_week:.4f}")

How ATXP Supports Analytics

ATXP’s transaction API gives you the raw data:

import httpx

def get_agent_analytics(agent_id: str, since: str) -> dict:
    # Transaction log
    txns = httpx.get(
        f"https://api.atxp.ai/v1/agents/{agent_id}/transactions",
        headers={"Authorization": f"Bearer {ATXP_API_KEY}"},
        params={"since": since}
    ).json()["data"]

    # Current balance
    balance = httpx.get(
        f"https://api.atxp.ai/v1/agents/{agent_id}/balance",
        headers={"Authorization": f"Bearer {ATXP_API_KEY}"}
    ).json()

    total_spend = sum(t["cost_usd"] for t in txns)
    by_model = {}
    for t in txns:
        by_model[t["model"]] = by_model.get(t["model"], 0) + t["cost_usd"]

    return {
        "total_spend_usd": total_spend,
        "call_count": len(txns),
        "by_model": by_model,
        "remaining_balance": balance["available_usd"],
        "transactions": txns
    }

Every transaction includes: timestamp, model, input tokens, output tokens, cost, and service. That’s the raw material for every metric above.

The Operational Takeaway

Cost analytics is what turns agents from black boxes into accountable systems. You should know, at any moment:

  • What your agents cost per task
  • Whether that’s going up or down
  • Which agents are outliers
  • Whether any user is driving disproportionate spend

This is operational table stakes for any production agent deployment, not a nice-to-have. Build the analytics layer before you need it — not after you get an unexpected bill.

ATXP’s transaction API provides the data layer for agent spending analytics — per-call cost records, balance tracking, and account isolation out of the box.