API Providers
Llama Stack composes 23 inference providers, 15 vector stores, 7 safety backends, 6 tool runtimes, and 3 file storage options into a single deployable server. No other open-source project covers this surface area in one process.
Providers come in two types:
- Remote: adapts an external service (Ollama, OpenAI, vLLM, Bedrock, etc.)
- Inline: runs in-process within Llama Stack (FAISS, sentence-transformers, Llama Guard, etc.)
At least one inline provider exists for each API so you can run a fully featured stack locally without any external dependencies.
Provider categories
Inference
Ollama, vLLM, OpenAI, Bedrock, Anthropic, Gemini, WatsonX, and more
Vector IO
FAISS, SQLite-Vec, ChromaDB, Qdrant, Milvus, PGVector, Weaviate
Safety
Llama Guard, Prompt Guard, Code Scanner, Bedrock Guardrails
Tool Runtime
File Search, Brave Search, Tavily, MCP, Wolfram Alpha
Files
Local filesystem and S3 storage backends
DatasetIO
Local filesystem and HuggingFace dataset loading
External Providers
Build your own provider and integrate it with Llama Stack
Reference
| Category | Count | Examples | Docs |
|---|---|---|---|
| Inference | 23 | Ollama, vLLM, OpenAI, Bedrock, Anthropic, Gemini, WatsonX, and more | Inference Providers |
| Vector IO | 15 | FAISS, ChromaDB, Qdrant, Milvus, PGVector, Weaviate, Elasticsearch | Vector IO Providers |
| Safety | 7 | Llama Guard, Prompt Guard, Code Scanner, Bedrock Guardrails | Safety Providers |
| Tool Runtime | 6 | File Search, Brave Search, Tavily, MCP, Wolfram Alpha | Tool Runtime Providers |
| Files | 3 | Local filesystem, S3, OpenAI Files | Files Providers |
| DatasetIO | 2 | Local filesystem, HuggingFace | DatasetIO Providers |
| External | — | Build your own provider | External Providers Guide |
Why this matters
Gateways like LiteLLM route inference requests to multiple providers. That's useful, but inference routing is one API.
Llama Stack composes the full application: an agent running on the Responses API can call a model via any inference provider, search documents in any vector store, check content with any safety backend, invoke tools via MCP, and stream the result back. All in one server process, all through OpenAI-compatible endpoints.
The composition is the hard part. Making inference + vector stores + files + safety + tools + agentic orchestration work together correctly across dozens of provider combinations is what takes years, not months.
OpenAI Compatibility
Llama Stack is OpenAI-first. The primary API surface implements the OpenAI spec. An Anthropic Messages adapter (/v1/messages) translates to the inference API for teams that use the Anthropic SDK.
See the OpenAI compatibility guide and known Responses API limitations.