Skip to main content

APIs

Llama Stack exposes OpenAI-compatible REST APIs. For the full endpoint list, see the API Reference.

Stability Levels

APIs are organized by maturity. Stable APIs (/v1/) follow semantic versioning and won't break between minor releases. Experimental APIs (/v1alpha/, /v1beta/) may change based on feedback. See API Leveling for details.

How the APIs work together

Files + Vector Stores + Responses

These three APIs combine to give you RAG in a few calls:

  1. Upload a document via /v1/files
  2. Create a vector store and attach the file via /v1/vector_stores
  3. Llama Stack automatically chunks, embeds, and indexes the document
  4. Use file_search in a /v1/responses request to search it

See File Operations and Vector Store Integration for details.

Tool Runtime

The Responses API orchestrates tools server-side. The model decides which tools to call, Llama Stack executes them and feeds results back automatically:

  • file_search - searches your vector stores
  • web_search - queries configured search providers
  • MCP - connects to any MCP server
  • Custom functions - your own function tools

Providers and External APIs

Not every provider supports every feature. See API Providers for how providers map to APIs, and External APIs for extending Llama Stack with custom endpoints.