Starter Distribution
The llamastack/distribution-starter distribution is a comprehensive, multi-provider distribution that includes most of the available inference providers in Llama Stack. It's designed to be a one-stop solution for developers who want to experiment with different AI providers without having to configure each one individually.
Provider Composition
The starter distribution consists of the following provider configurations:
| API | Provider(s) |
|---|---|
| responses | inline::builtin |
| datasetio | remote::huggingface, inline::localfs |
| eval | inline::builtin |
| files | inline::localfs |
| inference | remote::openai, remote::fireworks, remote::together, remote::ollama, remote::anthropic, remote::gemini, remote::groq, remote::sambanova, remote::vllm, remote::cerebras, remote::llama-openai-compat, remote::nvidia, inline::sentence-transformers |
| safety | inline::llama-guard |
| scoring | inline::basic, inline::llm-as-judge, inline::braintrust |
| tool_runtime | remote::brave-search, remote::tavily-search, inline::file-search, remote::model-context-protocol |
| vector_io | inline::faiss, inline::sqlite-vec, inline::milvus, remote::chromadb, remote::pgvector, remote::qdrant, remote::weaviate, remote::elasticsearch, remote::infinispan |
Inference Providers
The starter distribution includes a comprehensive set of inference providers:
Hosted Providers
- OpenAI: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings -
provider ID:
openai- reference documentation: openai - Fireworks: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and
embeddings - provider ID:
fireworks- reference documentation: fireworks - Together: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and
embeddings - provider ID:
together- reference documentation: together - Anthropic: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings - provider ID:
anthropic- reference documentation: anthropic - Gemini: Gemini 1.5, 2.0, 2.5 models and text embeddings - provider ID:
gemini- reference documentation: gemini - Groq: Fast Llama models (3.1, 3.2, 3.3, 4 Scout, 4 Maverick) - provider ID:
groq- reference documentation: groq - SambaNova: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models - provider ID:
sambanova- reference documentation: sambanova - Cerebras: Cerebras AI models - provider ID:
cerebras- reference documentation: cerebras - NVIDIA: NVIDIA NIM - provider ID:
nvidia- reference documentation: nvidia - Bedrock: AWS Bedrock models - provider ID:
bedrock- reference documentation: bedrock
Local/Remote Providers
- Ollama: Local Ollama models - provider ID:
ollama- reference documentation: ollama - vLLM: Local or remote vLLM server - provider ID:
vllm- reference documentation: vllm - Sentence Transformers: Local embedding models - provider ID:
sentence-transformers- reference documentation: sentence-transformers
All providers are disabled by default. So you need to enable them by setting the environment variables.
Vector IO
The starter distribution includes a comprehensive set of vector IO providers:
- FAISS: Local FAISS vector store - enabled by
default - provider ID:
faiss - SQLite: Local SQLite vector store - disabled by default - provider ID:
sqlite-vec - ChromaDB: Remote ChromaDB vector store - disabled by default - provider ID:
chromadb - PGVector: PostgreSQL vector store - disabled by default - provider ID:
pgvector - Milvus: Milvus vector store - disabled by default - provider ID:
milvus - Qdrant: Qdrant vector store - disabled by default - provider ID:
qdrant - Weaviate: Weaviate vector store - disabled by default - provider ID:
weaviate - Elasticsearch: Elasticsearch vector store - disabled by default - provider ID:
elasticsearch - Infinispan: Infinispan vector store - disabled by default - provider ID:
infinispan
Environment Variables
The following environment variables can be configured:
Server Configuration
LLAMA_STACK_PORT: Port for the Llama Stack distribution server (default:8321)
API Keys for Hosted Providers
OPENAI_API_KEY: OpenAI API keyFIREWORKS_API_KEY: Fireworks API keyTOGETHER_API_KEY: Together API keyANTHROPIC_API_KEY: Anthropic API keyGEMINI_API_KEY: Google Gemini API keyGROQ_API_KEY: Groq API keySAMBANOVA_API_KEY: SambaNova API keyCEREBRAS_API_KEY: Cerebras API keyLLAMA_API_KEY: Llama API keyNVIDIA_API_KEY: NVIDIA API keyHF_API_TOKEN: HuggingFace API token
Local Provider Configuration
OLLAMA_URL: Ollama server URL (default:http://localhost:11434)VLLM_URL: vLLM server URL (default:http://localhost:8000/v1)VLLM_MAX_TOKENS: vLLM max tokens (default:4096)VLLM_API_TOKEN: vLLM API token (default:fake)VLLM_TLS_VERIFY: vLLM TLS verification (default:true)TGI_URL: TGI server URL
Model Configuration
INFERENCE_MODEL: HuggingFace model for serverless inferenceINFERENCE_ENDPOINT_NAME: HuggingFace endpoint name
Vector Database Configuration
SQLITE_STORE_DIR: SQLite store directory (default:~/.llama/distributions/starter)ENABLE_SQLITE_VEC: Enable SQLite vector providerENABLE_CHROMADB: Enable ChromaDB providerENABLE_PGVECTOR: Enable PGVector providerCHROMADB_URL: ChromaDB server URLPGVECTOR_HOST: PGVector host (default:localhost)PGVECTOR_PORT: PGVector port (default:5432)PGVECTOR_DB: PGVector database namePGVECTOR_USER: PGVector usernamePGVECTOR_PASSWORD: PGVector passwordMILVUS_URL: Milvus server URLQDRANT_URL: Qdrant server URLWEAVIATE_CLUSTER_URL: Weaviate cluster URLELASTICSEARCH_URL: Elasticsearch server URL (default:localhost:9200)ELASTICSEARCH_API_KEY: Elasticsearch API keyINFINISPAN_URL: Infinispan server URL (default:http://localhost:11222)INFINISPAN_USERNAME: Infinispan authentication username (default:admin)INFINISPAN_PASSWORD: Infinispan authentication password
Tool Configuration
BRAVE_SEARCH_API_KEY: Brave Search API keyTAVILY_SEARCH_API_KEY: Tavily Search API key
Enabling Providers
You can enable specific providers by setting appropriate environment variables. For example,
# self-hosted
export OLLAMA_URL=http://localhost:11434/v1 # enables the Ollama inference provider
export VLLM_URL=http://localhost:8000/v1 # enables the vLLM inference provider
export TGI_URL=http://localhost:8000/v1 # enables the TGI inference provider
# cloud-hosted requiring API key configuration on the server
export CEREBRAS_API_KEY=your_cerebras_api_key # enables the Cerebras inference provider
export NVIDIA_API_KEY=your_nvidia_api_key # enables the NVIDIA inference provider
# vector providers
export MILVUS_URL=http://localhost:19530 # enables the Milvus vector provider
export CHROMADB_URL=http://localhost:8000/v1 # enables the ChromaDB vector provider
export PGVECTOR_DB=llama_stack_db # enables the PGVector vector provider
export INFINISPAN_URL=http://localhost:11222 # enables the Infinispan vector provider
This distribution comes with a default "llama-guard" shield that can be enabled by setting the SAFETY_MODEL environment variable to point to an appropriate Llama Guard model id. Use llama-stack-client models list to see the list of available models.
Running
See Starting a Llama Stack Server for all the ways to run (uv, container, library, Kubernetes).
Quick start:
uvx --from 'llama-stack[starter]' llama stack run starter
PostgreSQL Storage
By default, the starter distribution uses SQLite. For production, use PostgreSQL:
uvx --from 'llama-stack[starter]' llama stack run starter::run-with-postgres-store.yaml
Required environment variables for PostgreSQL:
| Variable | Default |
|---|---|
POSTGRES_HOST | localhost |
POSTGRES_PORT | 5432 |
POSTGRES_DB | llamastack |
POSTGRES_USER | llamastack |
POSTGRES_PASSWORD | llamastack |