How to Add ATXP to Haystack AI Agents
How to Add ATXP to Haystack AI Agents
Haystack AI agent payments get complicated fast. A production Haystack pipeline routinely calls three to five paid APIs per request — an embedder, a hosted vector retriever, a reranker, and a generator. Each one bills separately, carries separate credentials, and generates a separate invoice. ATXP consolidates all of it: one account, one balance, one receipt per pipeline run.
This guide covers Haystack’s Pipeline and component model, how to wire ATXP in as the payment layer for external tool calls, and the cost profile of hybrid retrieval pipelines where embedding, reranking, and generation each have distinct billing behavior.
What Is Haystack’s Pipeline and Component Model?
Haystack 2.x organizes everything as components — Python classes decorated with @component that declare typed inputs and outputs. Pipelines connect components together via those typed interfaces. A standard RAG pipeline:
from haystack import Pipeline
from haystack.components.embedders import OpenAITextEmbedder
from haystack.components.retrievers import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
pipeline = Pipeline()
pipeline.add_component("embedder", OpenAITextEmbedder())
pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=store))
pipeline.add_component("builder", PromptBuilder(template=template))
pipeline.add_component("generator", OpenAIGenerator(model="gpt-4o"))
pipeline.connect("embedder.embedding", "retriever.query_embedding")
pipeline.connect("retriever.documents", "builder.documents")
pipeline.connect("builder.prompt", "generator.prompt")
Each component in this graph calls at least one external API. The embedder hits OpenAI’s embedding endpoint. The retriever may call Pinecone or Weaviate. The generator calls a chat completion API. Add a reranker (Cohere, Jina AI) and you have four separate billing relationships for a single user query.
Haystack also ships an Agent class built on the same component model. Agents wrap a generator with tool-use capabilities, letting the model invoke Python functions — or external paid APIs — as needed. That tool-call layer is exactly where ATXP slots in cleanly.
What Do Haystack AI Agent Payments Actually Cost in a Hybrid Pipeline?
Hybrid retrieval combines dense (embedding) and sparse (BM25) search, then reranks the merged result set. It’s the standard architecture for production RAG that needs both precision and recall. Here’s what that costs per 1,000 queries using 2026 list pricing:
| Component | Provider | API / Model | Est. cost per 1K queries |
|---|---|---|---|
| Query embedding | OpenAI | text-embedding-3-small | ~$0.02 |
| Dense retrieval | Pinecone | Serverless, 1K reads | ~$0.08 |
| Sparse retrieval | Elastic Cloud | Managed, 1K queries | ~$0.50 |
| Reranking (top 20 docs) | Cohere | rerank-english-v3.0 | ~$2.00 |
| Generation (~2K tokens avg) | OpenAI | gpt-4o | ~$5.00 |
| Total | ~$7.60 |
Sources: OpenAI pricing page (April 2026), Cohere pricing page (April 2026), Pinecone serverless pricing (April 2026).
At modest traffic — 100K queries/month — that’s ~$760/month across five separate invoices. Without unified tracking, your true cost-per-query only becomes visible when you manually reconcile five dashboards at month end.
That’s the problem ATXP solves: one account that funds all five services, with per-pipeline-run cost attribution available in real time.
Why Managing API Keys Per Component Is the Real Risk
The credential problem is worse than the billing problem. Five external API calls means five API key sets, five rotation schedules, five blast-radius vectors if a key leaks or expires.
Haystack addresses this with its Secret type and environment variable conventions — components pull from OPENAI_API_KEY, COHERE_API_KEY, etc. That works for local dev. At agent scale it breaks down:
- No centralized revocation. Rotate one key when a breach occurs. The other four remain exposed until you catch up.
- No per-agent spend isolation. Every agent in a multi-tenant deployment shares the same API credential pool.
- No cross-service audit trail. You can’t answer “what did this agent spend on this task?” without correlating logs from five providers.
ATXP replaces per-service keys with a single agent identity. Your Haystack components authenticate to ATXP; ATXP proxies the actual API calls and records what was spent, when, and by which agent. For why this matters at scale, see The Real Cost of Managing 7 AI APIs Yourself.
How to Add ATXP to Haystack
Two integration patterns depending on how you use Haystack.
Pattern 1: Custom Component Wrapper
For pipeline components that call paid external APIs (web search, image generation, proprietary data APIs), wrap the call in a custom Haystack component that routes payment through ATXP:
from haystack import component
import atxp
@component
class ATXPWebSearchComponent:
def __init__(self, agent_id: str, budget_usd: float = 1.0):
self.client = atxp.Client(agent_id=agent_id)
self.budget_usd = budget_usd
@component.output_types(results=list[dict], error=str)
def run(self, query: str) -> dict:
try:
response = self.client.tools.web_search(
query=query,
budget=self.budget_usd
)
return {"results": response.results, "error": None}
except atxp.BudgetExhausted:
return {"results": [], "error": "agent_budget_exceeded"}
Drop this into any pipeline where you’d otherwise call a paid external search API:
pipeline.add_component(
"search",
ATXPWebSearchComponent(agent_id="agent_haystack_prod", budget_usd=0.50)
)
The component deducts from the agent’s ATXP balance per call. If the agent exhausts its budget mid-pipeline, ATXP raises a typed exception you handle at the component level rather than failing the entire graph. Haystack’s pipeline model supports conditional routing — wire the error output to a fallback branch (cached results, graceful degradation) natively.
Pattern 2: ATXP Tools in a Haystack Agent
For Haystack’s Agent class, register ATXP tools as agent tools directly. Haystack agents use function-calling syntax that maps onto ATXP’s tool interface without a wrapper:
from haystack.components.agents import Agent
from haystack.components.generators.chat import OpenAIChatGenerator
import atxp
client = atxp.Client(agent_id="agent_haystack_research")
def web_search(query: str) -> str:
"""Search the web for current information."""
result = client.tools.web_search(query=query)
return "\n".join(r.snippet for r in result.results[:3])
def generate_image(prompt: str) -> str:
"""Generate an image and return the URL."""
result = client.tools.image_generation(prompt=prompt)
return result.url
agent = Agent(
chat_generator=OpenAIChatGenerator(model="gpt-4o"),
tools=[web_search, generate_image],
system_prompt="You are a research agent with web access and image capabilities."
)
Every tool call routes through ATXP, deducting from the agent’s pre-funded balance. The generator (GPT-4o) still calls OpenAI directly — ATXP covers the external tools the agent invokes, not the LLM inference itself unless you route that through ATXP’s 100+ model gateway.
Ready to fund your Haystack agent? Create an ATXP account at atxp.ai, fund it via Stripe or USDC, grab your agent ID, and replace individual API keys with a single credential across 14+ tools. A Haystack agent running on ATXP takes about 30 minutes end-to-end.
How to Set Per-Pipeline Budget Caps
Without a hard spending ceiling, a malformed query that triggers repeated retrieval-plus-generation loops can accumulate real cost before you notice. Configure limits at the agent level, not the pipeline level:
import atxp
agent_config = atxp.AgentConfig(
agent_id="agent_haystack_prod",
budget_usd=5.00, # Hard ceiling for this agent's lifetime
per_call_limit_usd=0.50, # Max spend for any single tool call
alert_threshold_usd=4.00, # Webhook notification at 80% usage
)
client = atxp.Client(config=agent_config)
For multi-tenant Haystack deployments where each user gets an agent instance, create a separate agent_id per tenant. ATXP tracks spend per agent independently — you attribute costs to specific users or tasks without cross-contamination.
How ATXP’s budget controls compare to alternatives:
| Approach | Per-agent caps | Per-call limits | Unified audit trail | Build time |
|---|---|---|---|---|
| ATXP | Yes | Yes | Yes | ~1 hour |
| Native API keys | No (per-service only) | No | No | — |
| Virtual cards | Yes (card limit) | No | Partial | Setup per card |
| Custom middleware | Requires building | Requires building | Requires building | Days |
For more on why virtual cards don’t scale to agent workloads, see The Three Models for Agent Payments.
Cost Attribution Across Pipeline Components
Haystack doesn’t give you out-of-the-box cost breakdown per pipeline variant. You know what OpenAI charged this month and what Cohere charged, but you don’t know what your summarize_and_retrieve pipeline spent versus your deep_research pipeline.
ATXP’s per-agent attribution solves this directly — create separate agent IDs per pipeline type:
summarize_client = atxp.Client(agent_id="pipeline_summarize_v2")
research_client = atxp.Client(agent_id="pipeline_research_v1")
Every ATXP transaction is tagged with the calling agent ID. At month end, pull a breakdown by agent ID to see what each pipeline variant actually cost. Combine with Haystack’s native tracing (via HaystackTracer or OpenTelemetry) and you get full cost visibility from query input to generated output.
For the full cost math on why this attribution matters, see How Much Does an AI Agent Cost? and How ATXP’s IOU Model Caps Agent Spending.
Async Components for High-Throughput Pipelines
Haystack’s component model is synchronous by default, but ATXP calls are I/O-bound. For high-throughput pipelines, use async:
@component
class ATXPWebSearchAsync:
@component.output_types(results=list[dict])
async def run_async(self, query: str) -> dict:
response = await self.client.tools.web_search_async(query=query)
return {"results": response.results}
Haystack 2.x supports async components natively. This keeps ATXP tool calls non-blocking even in large pipeline graphs with parallel branches.
FAQ
Does ATXP work with Haystack’s hosted document stores like Pinecone or Weaviate?
Not directly — ATXP handles payment for external API tool calls, not for infrastructure you manage. Your Pinecone or Weaviate costs stay on those invoices. ATXP covers the tools your agents actively invoke at runtime: web search, image generation, LLM inference via third-party providers, email, SMS. The distinction is pipeline infrastructure you provision versus tools the agent calls dynamically.
Can I use ATXP if my Haystack generator is a self-hosted or open-source model?
Yes. ATXP is model-agnostic on the generator side. If your Haystack generator calls a self-hosted Ollama or vLLM endpoint, ATXP still covers the tools the agent invokes — web search, external data APIs, image generation. You’re not paying ATXP for LLM inference unless you explicitly route model calls through ATXP’s LLM gateway, which gives access to 100+ models under a single credential.
How does ATXP interact with Haystack’s pipeline serialization and YAML config?
Pass your ATXP agent ID via environment variable (ATXP_AGENT_ID) rather than hardcoding it in component constructors or YAML. The ATXP client resolves this automatically, so serialized pipeline configs stay credential-free — consistent with how Haystack handles OPENAI_API_KEY and other secrets. Set ATXP_AGENT_ID in your deployment environment alongside existing service keys; no pipeline config changes needed when you rotate credentials.
What’s the integration time for an existing Haystack production pipeline?
Roughly three to four hours for a standard hybrid RAG pipeline: ~30 minutes to create an ATXP account, fund it, and generate an agent ID; one to two hours to replace paid component calls with ATXP-routed equivalents; another hour to add budget caps, alert webhooks, and async support for high-throughput components. If you’re starting from scratch on a new pipeline, ATXP can be the payment layer from day one — no retrofit needed.
Haystack is production-grade infrastructure. Your payment layer should be too. Get started with ATXP to give your Haystack agents a real budget with hard controls.