Skip to main content

Anthropic Messages API

Llama Stack provides native support for the Anthropic Messages API at /v1/messages. Point the official Anthropic SDK at your Llama Stack server and use any model.

from anthropic import Anthropic

client = Anthropic(base_url="http://localhost:8321/v1", api_key="fake")
message = client.messages.create(
model="llama-3.3-70b",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)

How it works

The Messages adapter checks whether the configured inference provider natively supports the /v1/messages endpoint. If it does, the request is forwarded directly. Otherwise, the adapter translates between Anthropic and OpenAI formats.

Translation path (most providers): the adapter converts the full request and response between formats:

  • Request: Anthropic system, messages, tools, tool_choice, stop_sequences, and thinking are mapped to their OpenAI equivalents
  • Response: OpenAI completions are converted back to Anthropic format, including content blocks, stop_reason, and usage
  • Streaming: OpenAI streaming chunks are transformed into Anthropic SSE events (message_start, content_block_delta, message_delta, etc.)
  • Tool use: Anthropic tool_use / tool_result blocks are translated to and from OpenAI tool_calls / tool messages

Passthrough path (Ollama, vLLM): the request is forwarded directly to the provider's /v1/messages endpoint without any translation, since these providers natively speak the Anthropic format.

Implemented endpoints

EndpointMethodDescription
/v1/messagesPOSTCreate a message (streaming and non-streaming)
/v1/messages/count_tokensPOSTCount tokens for a message

Supported features

  • System messages (string or content blocks)
  • Multi-turn conversations
  • Streaming via Server-Sent Events
  • Tool definitions and tool use
  • Extended thinking (thinking blocks)
  • Temperature, top_p, top_k, stop sequences
  • Token counting

Not yet implemented

  • Message Batches (/v1/messages/batches)

For property-level conformance details and missing properties, see the conformance report.