How to Add ATXP to smolagents (HuggingFace)

smolagents is one of the cleaner paths to a working smolagents HuggingFace agent payments pipeline — but it ships with zero payment infrastructure. You get a capable, model-agnostic agent library that writes executable code or calls tools across nearly any LLM backend. What you don’t get is any answer to: “who pays for that web search?” or “how do I cap this agent’s spend before it burns $400 on inference overnight?”

This guide covers that gap precisely. How to wire ATXP into a smolagents project so paid tool calls are handled cleanly, costs are tracked per call, and budget limits actually enforce — regardless of which model backend you’re running.

What Makes smolagents Different From Other Agent Frameworks?

Most agent framework integration guides assume you’re running on OpenAI. smolagents doesn’t. Released by HuggingFace in January 2025, it’s designed to be model-agnostic from the ground up. The HuggingFace model hub surpassed 1 million public models by Q1 2025 (HuggingFace, March 2025) — smolagents can run agents against any of them.

The supported model backends:

BackendWhat It Runs
HfApiModelHuggingFace Inference API — Llama 3, Mistral, Qwen, Zephyr, and others
TransformersModelLocal inference via the transformers library
LiteLLMModelAny provider LiteLLM supports: Anthropic, Cohere, Azure, Together AI
OpenAIServerModelOpenAI or any OpenAI-compatible endpoint

The model layer is a runtime swap. That’s a genuine feature — you can benchmark Llama 3.3 70B against Qwen2.5 Coder on the same pipeline without rewriting your agent logic. But it creates a billing problem that OpenAI-centric frameworks don’t have: each backend has a different cost structure, a different API key, and a different billing portal.

If your pipeline uses one model for orchestration and another for code generation — which is common in multi-agent smolagents setups — you’re now managing two separate billing relationships before you’ve even added any external tool calls.

The Multi-Model Billing Problem in Production Pipelines

smolagents makes multi-agent orchestration straightforward. An orchestrator can spin up worker agents running different models:

from smolagents import ToolCallingAgent, CodeAgent, HfApiModel

orchestrator = ToolCallingAgent(
    tools=[...],
    model=HfApiModel("meta-llama/Llama-3.3-70B-Instruct"),
    max_steps=6,
)

worker = CodeAgent(
    tools=[...],
    model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct"),
    max_steps=8,
)

Without a unified payment layer, tracking the actual cost of a pipeline run means reconciling two HuggingFace Inference API invoices. Add a web search tool that calls a third-party API and you have three billing sources. Add image generation and it’s four.

The real cost of managing 7 AI APIs yourself isn’t just money — it’s key rotation across every service, separate rate limit handling, per-provider dashboards, and no single view of what any given agent run actually cost.

ATXP collapses this. One credential, one ledger, one dashboard — regardless of which models or tools are running.

How ATXP Fits Into smolagents

ATXP’s role in a smolagents project is as the payment layer for tool calls that cost money. The integration point is the @tool decorator.

In smolagents, tools are typed Python functions. Whatever a tool does internally — HTTP calls, subprocess invocations, SDK calls — is invisible to the agent. The agent sees a function signature and a docstring. ATXP calls live inside the tool implementation, not in the agent’s decision loop:

smolagents agent
    └── calls tool("query")
        └── tool makes ATXP-authenticated request
            └── ATXP handles payment + credential for underlying API
                └── result returned to agent

No agent code changes. Payment logic is contained entirely in the tool layer. That means you can retrofit ATXP into an existing smolagents project by updating tool implementations — nothing else changes.

Setting Up ATXP

Get your API key from atxp.ai. Fund your agent wallet — ATXP credits are the spend unit, equivalent to USD at a 1:1 rate for most tools.

pip install smolagents requests
export ATXP_API_KEY="your-atxp-key"
export HF_TOKEN="your-huggingface-token"

Building an ATXP-Backed Tool

Here’s a web search tool that routes through ATXP:

import os
import requests
from smolagents import tool

ATXP_BASE = "https://api.atxp.ai/v1"

def _atxp_headers() -> dict:
    return {
        "Authorization": f"Bearer {os.environ['ATXP_API_KEY']}",
        "Content-Type": "application/json",
    }

@tool
def web_search(query: str) -> str:
    """Search the web for current information on any topic.

    Args:
        query: The search query string. Be specific for better results.
    """
    resp = requests.post(
        f"{ATXP_BASE}/search",
        headers=_atxp_headers(),
        json={"query": query},
        timeout=15,
    )
    resp.raise_for_status()
    results = resp.json().get("results", [])
    return "\n\n".join(
        f"**{r['title']}**\n{r['snippet']}" for r in results[:5]
    )

The docstring format is load-bearing. smolagents parses the Args: block to construct the tool description it passes to the model. Vague argument descriptions produce worse tool calls — be specific about what the argument expects.

Full Agent Example With ToolCallingAgent

import os
from smolagents import ToolCallingAgent, HfApiModel

agent = ToolCallingAgent(
    tools=[web_search],
    model=HfApiModel(
        "meta-llama/Llama-3.3-70B-Instruct",
        token=os.environ["HF_TOKEN"],
    ),
    max_steps=6,
)

result = agent.run(
    "What are the three most-cited papers on agent memory architectures "
    "from 2025? Include author names and publication venues."
)
print(result)

Every web_search call routes through ATXP. Spend appears in your ATXP dashboard with timestamps and per-call cost. The agent has no visibility into the payment layer — it called a function and received a string.

CodeAgent Variation

smolagents’ CodeAgent writes executable Python rather than JSON tool calls. The ATXP integration is identical — tools are still the integration boundary:

from smolagents import CodeAgent, HfApiModel

agent = CodeAgent(
    tools=[web_search],
    model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct"),
    max_steps=8,
    additional_authorized_imports=["re", "json"],
)

result = agent.run(
    "Search for the HuggingFace Open LLM Leaderboard top 5 models "
    "and return their names and average scores as a Python dict."
)

CodeAgent generates Python that calls web_search(...) as a function and executes it in a sandboxed environment. The ATXP HTTP call inside web_search goes out from that sandbox — ensure your execution environment has outbound network access if you’re running local inference with TransformersModel.


Ready to add a payment layer to your smolagents project? Register your agent on atxp.ai — fund a wallet and your first tool calls are covered by $5 in free credits.


Scaling to Multiple Paid Tools

The pattern extends to any number of ATXP-backed tools:

@tool
def generate_image(prompt: str, width: int = 1024, height: int = 1024) -> str:
    """Generate an image from a text description using AI.

    Args:
        prompt: Detailed visual description of the image to generate.
        width: Output image width in pixels. Defaults to 1024.
        height: Output image height in pixels. Defaults to 1024.
    """
    resp = requests.post(
        f"{ATXP_BASE}/image",
        headers=_atxp_headers(),
        json={"prompt": prompt, "width": width, "height": height},
        timeout=60,
    )
    resp.raise_for_status()
    return resp.json()["url"]

agent = ToolCallingAgent(
    tools=[web_search, generate_image],
    model=HfApiModel("meta-llama/Llama-3.3-70B-Instruct"),
    max_steps=10,
)

Both tools charge against the same ATXP wallet. Cost attribution is per-call — the ATXP dashboard shows that the agent made 3 search calls ($0.03) and 1 image generation ($0.04), totaling $0.07 for the run. You don’t need to stitch that together from separate provider invoices.

For context on how agents pay for API calls at the infrastructure level, that post covers the underlying mechanics.

How Does ATXP Enforce Budget Limits for smolagents?

smolagents’ max_steps limits reasoning iterations but does nothing about cost. A single tool call can be expensive regardless of how many steps the agent takes. ATXP enforces budget at the wallet level, which is the right abstraction for spend control.

Enforcement options:

  1. Pre-loaded wallet ceiling — fund a dedicated wallet with a fixed amount. The agent can’t exceed it. When the balance hits zero, the ATXP API returns HTTP 402, and the tool raises an exception.
  2. Per-session limits — set a cap on a per-request basis via the ATXP API when initializing tool sessions.
  3. Spend webhooks — ATXP can POST to your endpoint when spend crosses a configured threshold, letting you halt agents before they hit the ceiling.

Handle the 402 explicitly for graceful degradation:

@tool
def web_search(query: str) -> str:
    """Search the web for current information on any topic.

    Args:
        query: The search query string.
    """
    try:
        resp = requests.post(
            f"{ATXP_BASE}/search",
            headers=_atxp_headers(),
            json={"query": query},
            timeout=15,
        )
        resp.raise_for_status()
        return "\n\n".join(
            r["snippet"] for r in resp.json().get("results", [])[:5]
        )
    except requests.HTTPError as e:
        if e.response.status_code == 402:
            return "Search unavailable: agent wallet balance insufficient."
        raise

Returning a descriptive string instead of raising lets the agent continue with cached context — useful for non-critical lookups. For tools where a payment failure should hard-stop the run, let the exception propagate and smolagents’ step failure handling takes over.

The IOU token spending limits model has more detail on the credit mechanics underlying wallet enforcement.

smolagents Integration Approach Comparison

ApproachCost trackingCredential managementBudget enforcementWorks with non-OpenAI models
Direct API keys per toolPer-provider dashboards onlyManual rotation per keyNoneYes
LangChain callbacksPartial (requires custom callback)ManualNoneYes
OpenAI usage trackingOpenAI onlyN/ASoft limits via OpenAI dashboardOpenAI only
ATXPPer-call, unified ledgerHandled by ATXPWallet ceiling + webhooksYes — any model backend

ATXP Tool Cost Reference

ToolApproximate cost per callNotes
Web search~$0.01Per query, returns up to 10 results
AI image generation~$0.04–$0.08Varies by resolution
AI video generation~$0.50–$2.00Varies by length and resolution
Email send~$0.001Per outbound message
SMS~$0.01Per message
100+ LLM modelsVariesPass-through at cost

Current pricing is always available at atxp.ai. These figures are approximate.

Frequently Asked Questions

Does ATXP work with local models via TransformersModel?

Yes. ATXP is decoupled from the model backend entirely. Whether your agent runs Llama on a local GPU via TransformersModel, a hosted model via HfApiModel, or GPT-4o via OpenAIServerModel, the tool layer is identical. ATXP only touches tool calls — it has no visibility into model inference.

Can I track cost per agent run rather than per individual tool call?

ATXP’s dashboard shows per-call data. To group by run, generate a UUID at the start of each agent run and pass it as a request header or in the JSON payload for every tool call. ATXP surfaces it in the ledger, letting you filter to see total spend for a specific run. This is the same pattern used in multi-agent system builds where attributing cost to a specific job matters.

smolagents supports running agents in parallel. Does ATXP handle concurrent tool calls?

Yes. ATXP’s API is stateless per-call — concurrent requests from multiple agents or threads deduct from the same wallet atomically. If you’re running a managed agent pool where multiple workers fire tool calls simultaneously, each deducts independently without race conditions. The wallet can reach zero but won’t go negative.

What happens to the agent run if ATXP is unavailable?

The tool raises an exception — either from a network timeout or a non-2xx response. smolagents surfaces this as a step failure. Design your tools with appropriate timeouts (the examples here use timeout=15 for search, timeout=60 for image generation) and decide at the tool level whether to retry, degrade gracefully, or let the exception propagate and stop the run.

Does this work with the smolagents GradioUI?

Yes. GradioUI(agent).launch() wraps the same agent object — tool calls go through ATXP identically whether the agent is invoked programmatically or via the Gradio interface. The payment layer is inside the tool implementation, not in the agent’s execution path.

Further Reading