API Providers
Every API in Llama Stack is backed by one or more providers. You can swap providers without changing your application code.
Provider Types
| Type | How it works | Examples |
|---|---|---|
Remote (remote::) | Adapts an external service via a thin client | Ollama, OpenAI, vLLM, Bedrock, Fireworks, Together |
Inline (inline::) | Runs entirely inside the Llama Stack process | FAISS, SQLite-vec, sentence-transformers, Llama Guard |
Llama Stack provides at least one inline provider for each API so you can run a fully featured stack locally without any external dependencies.
Multiple Providers Per API
You can configure multiple providers for the same API. The routing table dispatches requests to the right provider based on the resource (model, vector store, etc.):
providers:
inference:
- provider_id: ollama
provider_type: remote::ollama
config:
base_url: http://localhost:11434/v1
- provider_id: openai
provider_type: remote::openai
config:
api_key: ${env.OPENAI_API_KEY}
Each provider automatically discovers its available models at startup. Requests are routed based on the model identifier - models from Ollama go to the Ollama provider, models from OpenAI go to the OpenAI provider. Same endpoint, same client code.
Available Providers
See the Providers section for the full list of supported providers per API.