Skip to main content

API Providers

Llama Stack composes 23 inference providers, 15 vector stores, 7 safety backends, 6 tool runtimes, and 3 file storage options into a single deployable server. No other open-source project covers this surface area in one process.

Providers come in two types:

  • Remote: adapts an external service (Ollama, OpenAI, vLLM, Bedrock, etc.)
  • Inline: runs in-process within Llama Stack (FAISS, sentence-transformers, Llama Guard, etc.)
info

At least one inline provider exists for each API so you can run a fully featured stack locally without any external dependencies.

Provider categories

Reference

CategoryCountExamplesDocs
Inference23Ollama, vLLM, OpenAI, Bedrock, Anthropic, Gemini, WatsonX, and moreInference Providers
Vector IO15FAISS, ChromaDB, Qdrant, Milvus, PGVector, Weaviate, ElasticsearchVector IO Providers
Safety7Llama Guard, Prompt Guard, Code Scanner, Bedrock GuardrailsSafety Providers
Tool Runtime6File Search, Brave Search, Tavily, MCP, Wolfram AlphaTool Runtime Providers
Files3Local filesystem, S3, OpenAI FilesFiles Providers
DatasetIO2Local filesystem, HuggingFaceDatasetIO Providers
ExternalBuild your own providerExternal Providers Guide

Why this matters

Gateways like LiteLLM route inference requests to multiple providers. That's useful, but inference routing is one API.

Llama Stack composes the full application: an agent running on the Responses API can call a model via any inference provider, search documents in any vector store, check content with any safety backend, invoke tools via MCP, and stream the result back. All in one server process, all through OpenAI-compatible endpoints.

The composition is the hard part. Making inference + vector stores + files + safety + tools + agentic orchestration work together correctly across dozens of provider combinations is what takes years, not months.

OpenAI Compatibility

Llama Stack is OpenAI-first. The primary API surface implements the OpenAI spec. An Anthropic Messages adapter (/v1/messages) translates to the inference API for teams that use the Anthropic SDK.

See the OpenAI compatibility guide and known Responses API limitations.