Skip to main content

Responses API vs Agents API

Llama Stack provides two APIs for building AI applications with tool calling. The Responses API is the recommended path for new applications.

Use the Responses API

The Responses API is OpenAI-compatible and provides:

  • Dynamic configuration - change model, tools, and vector stores on every call
  • Conversation branching - fork from any previous response via previous_response_id
  • Built-in tool orchestration - file_search, web_search, MCP, and custom functions
  • Standard OpenAI SDK - works with any OpenAI client library
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")

response = client.responses.create(
model="llama3.2:3b",
input="Search my docs for deployment instructions",
tools=[{
"type": "file_search",
"vector_store_ids": ["vs_abc123"],
}],
)

# Continue the conversation, switch model
response2 = client.responses.create(
model="openai/gpt-4o",
input="Now summarize what you found",
previous_response_id=response.id,
)

Legacy Agents API

The Agents API is an older, Llama Stack-specific API that uses sessions and turns. It is still functional but is not recommended for new applications.

Key differences from Responses:

Responses APIAgents API
SDKStandard OpenAI SDKLlama Stack client only
ConfigurationDynamic per callStatic per session
Conversation modelBranching via response IDsLinear sessions
Toolsfile_search, web_search, MCP, functionsbuiltin::file_search, code_interpreter
SafetyVia /v1/moderations or guardrail paramsBuilt-in input/output shields

If you have existing code using the Agents API, it will continue to work. For new projects, use the Responses API.