APIs
Llama Stack exposes OpenAI-compatible REST APIs. For the full endpoint list, see the API Reference.
Stability Levels
APIs are organized by maturity. Stable APIs (/v1/) follow semantic versioning and won't break between minor releases. Experimental APIs (/v1alpha/, /v1beta/) may change based on feedback. See API Leveling for details.
How the APIs work together
Files + Vector Stores + Responses
These three APIs combine to give you RAG in a few calls:
- Upload a document via
/v1/files - Create a vector store and attach the file via
/v1/vector_stores - Llama Stack automatically chunks, embeds, and indexes the document
- Use
file_searchin a/v1/responsesrequest to search it
See File Operations and Vector Store Integration for details.
Tool Runtime
The Responses API orchestrates tools server-side. The model decides which tools to call, Llama Stack executes them and feeds results back automatically:
- file_search - searches your vector stores
- web_search - queries configured search providers
- MCP - connects to any MCP server
- Custom functions - your own function tools
Providers and External APIs
Not every provider supports every feature. See API Providers for how providers map to APIs, and External APIs for extending Llama Stack with custom endpoints.