What is agent spending analytics?

Agent spending analytics is the practice of tracking, analyzing, and optimizing the costs incurred by AI agents — LLM API calls, external service purchases, compute, and tool usage. Good analytics tells you not just what you spent, but cost per task, cost per user, which agents are inefficient, and when something is behaving unexpectedly.

What metrics should I track for AI agent costs?

Core metrics: total spend by agent, cost per completed task, cost per user (for multi-tenant apps), model cost breakdown (input vs. output tokens), tool call frequency, and spend rate over time. Advanced: cost per task quality level, anomaly z-scores, and efficiency ratios comparing task complexity to cost.

How do I detect when an AI agent is spending abnormally?

Set a baseline spend rate for each agent type (e.g., $0.02 per research task), then alert when the observed rate exceeds 2-3x the baseline. Absolute spend alerts are too crude — an agent handling 10x more tasks should cost 10x more. Rate-based anomaly detection is what you want.

What's the difference between LLM observability and agent spending analytics?

LLM observability (tools like Langfuse, Helicone) tracks model-level metrics: latency, token counts, error rates. Agent spending analytics tracks business-level metrics: cost per task, cost per user, total workflow cost, budget utilization, and cost trends over time. They're complementary — observability for debugging, analytics for optimization and accountability.

How do I attribute agent costs to individual users in a multi-tenant app?

Create one ATXP agent account per user session or per user. When you query ATXP's transaction API with a user's agent ID, you get all costs attributable to that user. Sum these for billing, reporting, or usage limits. Don't try to attribute costs from a shared agent retroactively — it's much harder and less accurate.

Agent Spending Analytics: What to Track and Why

“How much did my agents spend last month?” is the wrong question.

The right questions are: what did each agent spend per completed task, which agents are becoming more expensive over time, and is there an agent that spent $50 on a task that should cost $0.50?

That’s the difference between cost reporting and agent spending analytics.

Why Basic Cost Tracking Isn’t Enough

Raw spend totals tell you whether to be alarmed or not alarmed. They don’t tell you:

Whether your agents are getting more or less efficient over time
Which users are driving disproportionate costs in a multi-tenant app
Which model selection is costing you more without better results
When an agent is looping or misbehaving (often visible as a cost spike before you notice the behavior)

Agents that handle more tasks should cost more. Agents that handle the same tasks at increasing cost are a problem. Those look identical in raw spend reports.

The Metrics That Matter

1. Cost Per Completed Task

What it is: Total agent spend divided by number of tasks completed in a time window.

Why it matters: Drift in cost-per-task usually signals a model regression, prompt degradation, or tool failure that’s causing the agent to retry more.

How to track it: Log task start/end events alongside ATXP transaction data. Match spend to task IDs.

# Example cost-per-task calculation
def cost_per_task(agent_id: str, task_ids: list[str]) -> dict:
    txns = get_transactions(agent_id)
    tasks_completed = count_completed_tasks(task_ids)
    total_spend = sum(t["cost_usd"] for t in txns)
    return {
        "total_spend": total_spend,
        "tasks_completed": tasks_completed,
        "cost_per_task": total_spend / tasks_completed if tasks_completed else 0
    }

2. Spend Rate (Per Hour/Day)

What it is: Rolling spend per unit time for each agent.

Why it matters: Anomalous spend rate is often the first signal of a misbehaving agent — before you notice wrong outputs or user complaints.

Threshold to alert: When observed spend rate exceeds 2-3x the trailing 7-day average for that agent.

3. Model Cost Breakdown

What it is: Split of costs by model and by input vs. output tokens.

Why it matters: Output tokens are 4-5x more expensive than input tokens for most models. Agents that generate verbose reasoning chains before answering have a different cost profile than terse, direct agents. Understanding this split helps prompt optimization.

4. Cost Per User (Multi-Tenant)

What it is: In apps where multiple users run agents, total spend attributable to each user.

Why it matters: In any distribution of users, a small percentage typically drive a disproportionate share of costs. Understanding which users are high-cost lets you enforce usage limits, design pricing tiers, or have direct conversations.

Implementation: One ATXP agent account per user. User cost = sum of that account’s transactions.

5. Budget Utilization Rate

What it is: What percentage of each agent’s allocated budget is being consumed per period.

Why it matters: Agents consistently hitting 90%+ of their budget need their budgets adjusted or their efficiency improved. Agents using 5% of their budget might have budgets that are too large, which limits your ability to detect anomalies.

Anomaly Detection: The Practical Approach

Most agent cost anomalies follow one of three patterns:

Spike: Agent spends 10x its normal rate in a short window. Cause: loop, retries, unexpected prompt, tool failure causing repeated calls.

Gradual drift: Cost-per-task increases slowly over days/weeks. Cause: prompt changes that made the agent more verbose, model updates, accumulated conversation context bloating token counts.

Per-user outlier: One user’s agent costs 20x the median. Cause: unusual use patterns, adversarial inputs, or a genuine heavy user.

Simple alerting for each:

def check_spend_anomalies(agent_id: str):
    recent = get_spend_last_hour(agent_id)
    baseline = get_average_hourly_spend_last_7_days(agent_id)

    if baseline > 0 and recent > baseline * 3:
        alert(f"Agent {agent_id} spend spike: ${recent:.4f}/hr vs ${baseline:.4f}/hr baseline")

    # Gradual drift check
    this_week = get_cost_per_task_this_week(agent_id)
    last_week = get_cost_per_task_last_week(agent_id)

    if last_week > 0 and this_week > last_week * 1.5:
        alert(f"Agent {agent_id} cost-per-task up 50%: ${this_week:.4f} vs ${last_week:.4f}")

How ATXP Supports Analytics

ATXP’s transaction API gives you the raw data:

import httpx

def get_agent_analytics(agent_id: str, since: str) -> dict:
    # Transaction log
    txns = httpx.get(
        f"https://api.atxp.ai/v1/agents/{agent_id}/transactions",
        headers={"Authorization": f"Bearer {ATXP_API_KEY}"},
        params={"since": since}
    ).json()["data"]

    # Current balance
    balance = httpx.get(
        f"https://api.atxp.ai/v1/agents/{agent_id}/balance",
        headers={"Authorization": f"Bearer {ATXP_API_KEY}"}
    ).json()

    total_spend = sum(t["cost_usd"] for t in txns)
    by_model = {}
    for t in txns:
        by_model[t["model"]] = by_model.get(t["model"], 0) + t["cost_usd"]

    return {
        "total_spend_usd": total_spend,
        "call_count": len(txns),
        "by_model": by_model,
        "remaining_balance": balance["available_usd"],
        "transactions": txns
    }

Every transaction includes: timestamp, model, input tokens, output tokens, cost, and service. That’s the raw material for every metric above.

The Operational Takeaway

Cost analytics is what turns agents from black boxes into accountable systems. You should know, at any moment:

What your agents cost per task
Whether that’s going up or down
Which agents are outliers
Whether any user is driving disproportionate spend

This is operational table stakes for any production agent deployment, not a nice-to-have. Build the analytics layer before you need it — not after you get an unexpected bill.

ATXP’s transaction API provides the data layer for agent spending analytics — per-call cost records, balance tracking, and account isolation out of the box.