How to Add ATXP to smolagents (HuggingFace)
smolagents is one of the cleaner paths to a working smolagents HuggingFace agent payments pipeline — but it ships with zero payment infrastructure. You get a capable, model-agnostic agent library that writes executable code or calls tools across nearly any LLM backend. What you don’t get is any answer to: “who pays for that web search?” or “how do I cap this agent’s spend before it burns $400 on inference overnight?”
This guide covers that gap precisely. How to wire ATXP into a smolagents project so paid tool calls are handled cleanly, costs are tracked per call, and budget limits actually enforce — regardless of which model backend you’re running.
What Makes smolagents Different From Other Agent Frameworks?
Most agent framework integration guides assume you’re running on OpenAI. smolagents doesn’t. Released by HuggingFace in January 2025, it’s designed to be model-agnostic from the ground up. The HuggingFace model hub surpassed 1 million public models by Q1 2025 (HuggingFace, March 2025) — smolagents can run agents against any of them.
The supported model backends:
| Backend | What It Runs |
|---|---|
HfApiModel | HuggingFace Inference API — Llama 3, Mistral, Qwen, Zephyr, and others |
TransformersModel | Local inference via the transformers library |
LiteLLMModel | Any provider LiteLLM supports: Anthropic, Cohere, Azure, Together AI |
OpenAIServerModel | OpenAI or any OpenAI-compatible endpoint |
The model layer is a runtime swap. That’s a genuine feature — you can benchmark Llama 3.3 70B against Qwen2.5 Coder on the same pipeline without rewriting your agent logic. But it creates a billing problem that OpenAI-centric frameworks don’t have: each backend has a different cost structure, a different API key, and a different billing portal.
If your pipeline uses one model for orchestration and another for code generation — which is common in multi-agent smolagents setups — you’re now managing two separate billing relationships before you’ve even added any external tool calls.
The Multi-Model Billing Problem in Production Pipelines
smolagents makes multi-agent orchestration straightforward. An orchestrator can spin up worker agents running different models:
from smolagents import ToolCallingAgent, CodeAgent, HfApiModel
orchestrator = ToolCallingAgent(
tools=[...],
model=HfApiModel("meta-llama/Llama-3.3-70B-Instruct"),
max_steps=6,
)
worker = CodeAgent(
tools=[...],
model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct"),
max_steps=8,
)
Without a unified payment layer, tracking the actual cost of a pipeline run means reconciling two HuggingFace Inference API invoices. Add a web search tool that calls a third-party API and you have three billing sources. Add image generation and it’s four.
The real cost of managing 7 AI APIs yourself isn’t just money — it’s key rotation across every service, separate rate limit handling, per-provider dashboards, and no single view of what any given agent run actually cost.
ATXP collapses this. One credential, one ledger, one dashboard — regardless of which models or tools are running.
How ATXP Fits Into smolagents
ATXP’s role in a smolagents project is as the payment layer for tool calls that cost money. The integration point is the @tool decorator.
In smolagents, tools are typed Python functions. Whatever a tool does internally — HTTP calls, subprocess invocations, SDK calls — is invisible to the agent. The agent sees a function signature and a docstring. ATXP calls live inside the tool implementation, not in the agent’s decision loop:
smolagents agent
└── calls tool("query")
└── tool makes ATXP-authenticated request
└── ATXP handles payment + credential for underlying API
└── result returned to agent
No agent code changes. Payment logic is contained entirely in the tool layer. That means you can retrofit ATXP into an existing smolagents project by updating tool implementations — nothing else changes.
Setting Up ATXP
Get your API key from atxp.ai. Fund your agent wallet — ATXP credits are the spend unit, equivalent to USD at a 1:1 rate for most tools.
pip install smolagents requests
export ATXP_API_KEY="your-atxp-key"
export HF_TOKEN="your-huggingface-token"
Building an ATXP-Backed Tool
Here’s a web search tool that routes through ATXP:
import os
import requests
from smolagents import tool
ATXP_BASE = "https://api.atxp.ai/v1"
def _atxp_headers() -> dict:
return {
"Authorization": f"Bearer {os.environ['ATXP_API_KEY']}",
"Content-Type": "application/json",
}
@tool
def web_search(query: str) -> str:
"""Search the web for current information on any topic.
Args:
query: The search query string. Be specific for better results.
"""
resp = requests.post(
f"{ATXP_BASE}/search",
headers=_atxp_headers(),
json={"query": query},
timeout=15,
)
resp.raise_for_status()
results = resp.json().get("results", [])
return "\n\n".join(
f"**{r['title']}**\n{r['snippet']}" for r in results[:5]
)
The docstring format is load-bearing. smolagents parses the Args: block to construct the tool description it passes to the model. Vague argument descriptions produce worse tool calls — be specific about what the argument expects.
Full Agent Example With ToolCallingAgent
import os
from smolagents import ToolCallingAgent, HfApiModel
agent = ToolCallingAgent(
tools=[web_search],
model=HfApiModel(
"meta-llama/Llama-3.3-70B-Instruct",
token=os.environ["HF_TOKEN"],
),
max_steps=6,
)
result = agent.run(
"What are the three most-cited papers on agent memory architectures "
"from 2025? Include author names and publication venues."
)
print(result)
Every web_search call routes through ATXP. Spend appears in your ATXP dashboard with timestamps and per-call cost. The agent has no visibility into the payment layer — it called a function and received a string.
CodeAgent Variation
smolagents’ CodeAgent writes executable Python rather than JSON tool calls. The ATXP integration is identical — tools are still the integration boundary:
from smolagents import CodeAgent, HfApiModel
agent = CodeAgent(
tools=[web_search],
model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct"),
max_steps=8,
additional_authorized_imports=["re", "json"],
)
result = agent.run(
"Search for the HuggingFace Open LLM Leaderboard top 5 models "
"and return their names and average scores as a Python dict."
)
CodeAgent generates Python that calls web_search(...) as a function and executes it in a sandboxed environment. The ATXP HTTP call inside web_search goes out from that sandbox — ensure your execution environment has outbound network access if you’re running local inference with TransformersModel.
Ready to add a payment layer to your smolagents project? Register your agent on atxp.ai — fund a wallet and your first tool calls are covered by $5 in free credits.
Scaling to Multiple Paid Tools
The pattern extends to any number of ATXP-backed tools:
@tool
def generate_image(prompt: str, width: int = 1024, height: int = 1024) -> str:
"""Generate an image from a text description using AI.
Args:
prompt: Detailed visual description of the image to generate.
width: Output image width in pixels. Defaults to 1024.
height: Output image height in pixels. Defaults to 1024.
"""
resp = requests.post(
f"{ATXP_BASE}/image",
headers=_atxp_headers(),
json={"prompt": prompt, "width": width, "height": height},
timeout=60,
)
resp.raise_for_status()
return resp.json()["url"]
agent = ToolCallingAgent(
tools=[web_search, generate_image],
model=HfApiModel("meta-llama/Llama-3.3-70B-Instruct"),
max_steps=10,
)
Both tools charge against the same ATXP wallet. Cost attribution is per-call — the ATXP dashboard shows that the agent made 3 search calls ($0.03) and 1 image generation ($0.04), totaling $0.07 for the run. You don’t need to stitch that together from separate provider invoices.
For context on how agents pay for API calls at the infrastructure level, that post covers the underlying mechanics.
How Does ATXP Enforce Budget Limits for smolagents?
smolagents’ max_steps limits reasoning iterations but does nothing about cost. A single tool call can be expensive regardless of how many steps the agent takes. ATXP enforces budget at the wallet level, which is the right abstraction for spend control.
Enforcement options:
- Pre-loaded wallet ceiling — fund a dedicated wallet with a fixed amount. The agent can’t exceed it. When the balance hits zero, the ATXP API returns HTTP 402, and the tool raises an exception.
- Per-session limits — set a cap on a per-request basis via the ATXP API when initializing tool sessions.
- Spend webhooks — ATXP can POST to your endpoint when spend crosses a configured threshold, letting you halt agents before they hit the ceiling.
Handle the 402 explicitly for graceful degradation:
@tool
def web_search(query: str) -> str:
"""Search the web for current information on any topic.
Args:
query: The search query string.
"""
try:
resp = requests.post(
f"{ATXP_BASE}/search",
headers=_atxp_headers(),
json={"query": query},
timeout=15,
)
resp.raise_for_status()
return "\n\n".join(
r["snippet"] for r in resp.json().get("results", [])[:5]
)
except requests.HTTPError as e:
if e.response.status_code == 402:
return "Search unavailable: agent wallet balance insufficient."
raise
Returning a descriptive string instead of raising lets the agent continue with cached context — useful for non-critical lookups. For tools where a payment failure should hard-stop the run, let the exception propagate and smolagents’ step failure handling takes over.
The IOU token spending limits model has more detail on the credit mechanics underlying wallet enforcement.
smolagents Integration Approach Comparison
| Approach | Cost tracking | Credential management | Budget enforcement | Works with non-OpenAI models |
|---|---|---|---|---|
| Direct API keys per tool | Per-provider dashboards only | Manual rotation per key | None | Yes |
| LangChain callbacks | Partial (requires custom callback) | Manual | None | Yes |
| OpenAI usage tracking | OpenAI only | N/A | Soft limits via OpenAI dashboard | OpenAI only |
| ATXP | Per-call, unified ledger | Handled by ATXP | Wallet ceiling + webhooks | Yes — any model backend |
ATXP Tool Cost Reference
| Tool | Approximate cost per call | Notes |
|---|---|---|
| Web search | ~$0.01 | Per query, returns up to 10 results |
| AI image generation | ~$0.04–$0.08 | Varies by resolution |
| AI video generation | ~$0.50–$2.00 | Varies by length and resolution |
| Email send | ~$0.001 | Per outbound message |
| SMS | ~$0.01 | Per message |
| 100+ LLM models | Varies | Pass-through at cost |
Current pricing is always available at atxp.ai. These figures are approximate.
Frequently Asked Questions
Does ATXP work with local models via TransformersModel?
Yes. ATXP is decoupled from the model backend entirely. Whether your agent runs Llama on a local GPU via TransformersModel, a hosted model via HfApiModel, or GPT-4o via OpenAIServerModel, the tool layer is identical. ATXP only touches tool calls — it has no visibility into model inference.
Can I track cost per agent run rather than per individual tool call?
ATXP’s dashboard shows per-call data. To group by run, generate a UUID at the start of each agent run and pass it as a request header or in the JSON payload for every tool call. ATXP surfaces it in the ledger, letting you filter to see total spend for a specific run. This is the same pattern used in multi-agent system builds where attributing cost to a specific job matters.
smolagents supports running agents in parallel. Does ATXP handle concurrent tool calls?
Yes. ATXP’s API is stateless per-call — concurrent requests from multiple agents or threads deduct from the same wallet atomically. If you’re running a managed agent pool where multiple workers fire tool calls simultaneously, each deducts independently without race conditions. The wallet can reach zero but won’t go negative.
What happens to the agent run if ATXP is unavailable?
The tool raises an exception — either from a network timeout or a non-2xx response. smolagents surfaces this as a step failure. Design your tools with appropriate timeouts (the examples here use timeout=15 for search, timeout=60 for image generation) and decide at the tool level whether to retry, degrade gracefully, or let the exception propagate and stop the run.
Does this work with the smolagents GradioUI?
Yes. GradioUI(agent).launch() wraps the same agent object — tool calls go through ATXP identically whether the agent is invoked programmatically or via the Gradio interface. The payment layer is inside the tool implementation, not in the agent’s execution path.
Further Reading
- How Agents Pay for API Calls — the infrastructure mechanics behind agent payment models
- The Three Models for Agent Payments — IOU tokens, virtual cards, and crypto compared
- How to Build a Multi-Agent System With ATXP — orchestrator and worker patterns with shared wallets