Starter Distribution

The llamastack/distribution-starter distribution is a comprehensive, multi-provider distribution that includes most of the available inference providers in Llama Stack. It's designed to be a one-stop solution for developers who want to experiment with different AI providers without having to configure each one individually.

Provider Composition

The starter distribution consists of the following provider configurations:

API	Provider(s)
responses	`inline::builtin`
datasetio	`remote::huggingface`, `inline::localfs`
eval	`inline::builtin`
files	`inline::localfs`
inference	`remote::openai`, `remote::fireworks`, `remote::together`, `remote::ollama`, `remote::anthropic`, `remote::gemini`, `remote::groq`, `remote::sambanova`, `remote::vllm`, `remote::cerebras`, `remote::llama-openai-compat`, `remote::nvidia`, `inline::sentence-transformers`
safety	`inline::llama-guard`
scoring	`inline::basic`, `inline::llm-as-judge`, `inline::braintrust`
tool_runtime	`remote::brave-search`, `remote::tavily-search`, `inline::file-search`, `remote::model-context-protocol`
vector_io	`inline::faiss`, `inline::sqlite-vec`, `inline::milvus`, `remote::chromadb`, `remote::pgvector`, `remote::qdrant`, `remote::weaviate`, `remote::elasticsearch`, `remote::infinispan`

Inference Providers

The starter distribution includes a comprehensive set of inference providers:

Hosted Providers

OpenAI: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings - provider ID: openai - reference documentation: openai
Fireworks: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings - provider ID: fireworks - reference documentation: fireworks
Together: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings - provider ID: together - reference documentation: together
Anthropic: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings - provider ID: anthropic - reference documentation: anthropic
Gemini: Gemini 1.5, 2.0, 2.5 models and text embeddings - provider ID: gemini - reference documentation: gemini
Groq: Fast Llama models (3.1, 3.2, 3.3, 4 Scout, 4 Maverick) - provider ID: groq - reference documentation: groq
SambaNova: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models - provider ID: sambanova - reference documentation: sambanova
Cerebras: Cerebras AI models - provider ID: cerebras - reference documentation: cerebras
NVIDIA: NVIDIA NIM - provider ID: nvidia - reference documentation: nvidia
Bedrock: AWS Bedrock models - provider ID: bedrock - reference documentation: bedrock

Local/Remote Providers

Ollama: Local Ollama models - provider ID: ollama - reference documentation: ollama
vLLM: Local or remote vLLM server - provider ID: vllm - reference documentation: vllm
Sentence Transformers: Local embedding models - provider ID: sentence-transformers - reference documentation: sentence-transformers

All providers are disabled by default. So you need to enable them by setting the environment variables.

Vector IO

The starter distribution includes a comprehensive set of vector IO providers:

FAISS: Local FAISS vector store - enabled by default - provider ID: faiss
SQLite: Local SQLite vector store - disabled by default - provider ID: sqlite-vec
ChromaDB: Remote ChromaDB vector store - disabled by default - provider ID: chromadb
PGVector: PostgreSQL vector store - disabled by default - provider ID: pgvector
Milvus: Milvus vector store - disabled by default - provider ID: milvus
Qdrant: Qdrant vector store - disabled by default - provider ID: qdrant
Weaviate: Weaviate vector store - disabled by default - provider ID: weaviate
Elasticsearch: Elasticsearch vector store - disabled by default - provider ID: elasticsearch
Infinispan: Infinispan vector store - disabled by default - provider ID: infinispan

Environment Variables

The following environment variables can be configured:

Server Configuration

LLAMA_STACK_PORT: Port for the Llama Stack distribution server (default: 8321)

API Keys for Hosted Providers

OPENAI_API_KEY: OpenAI API key
FIREWORKS_API_KEY: Fireworks API key
TOGETHER_API_KEY: Together API key
ANTHROPIC_API_KEY: Anthropic API key
GEMINI_API_KEY: Google Gemini API key
GROQ_API_KEY: Groq API key
SAMBANOVA_API_KEY: SambaNova API key
CEREBRAS_API_KEY: Cerebras API key
LLAMA_API_KEY: Llama API key
NVIDIA_API_KEY: NVIDIA API key
HF_API_TOKEN: HuggingFace API token

Local Provider Configuration

OLLAMA_URL: Ollama server URL (default: http://localhost:11434)
VLLM_URL: vLLM server URL (default: http://localhost:8000/v1)
VLLM_MAX_TOKENS: vLLM max tokens (default: 4096)
VLLM_API_TOKEN: vLLM API token (default: fake)
VLLM_TLS_VERIFY: vLLM TLS verification (default: true)
TGI_URL: TGI server URL

Model Configuration

INFERENCE_MODEL: HuggingFace model for serverless inference
INFERENCE_ENDPOINT_NAME: HuggingFace endpoint name

Vector Database Configuration

SQLITE_STORE_DIR: SQLite store directory (default: ~/.llama/distributions/starter)
ENABLE_SQLITE_VEC: Enable SQLite vector provider
ENABLE_CHROMADB: Enable ChromaDB provider
ENABLE_PGVECTOR: Enable PGVector provider
CHROMADB_URL: ChromaDB server URL
PGVECTOR_HOST: PGVector host (default: localhost)
PGVECTOR_PORT: PGVector port (default: 5432)
PGVECTOR_DB: PGVector database name
PGVECTOR_USER: PGVector username
PGVECTOR_PASSWORD: PGVector password
MILVUS_URL: Milvus server URL
QDRANT_URL: Qdrant server URL
WEAVIATE_CLUSTER_URL: Weaviate cluster URL
ELASTICSEARCH_URL: Elasticsearch server URL (default: localhost:9200)
ELASTICSEARCH_API_KEY: Elasticsearch API key
INFINISPAN_URL: Infinispan server URL (default: http://localhost:11222)
INFINISPAN_USERNAME: Infinispan authentication username (default: admin)
INFINISPAN_PASSWORD: Infinispan authentication password

Tool Configuration

BRAVE_SEARCH_API_KEY: Brave Search API key
TAVILY_SEARCH_API_KEY: Tavily Search API key

Enabling Providers

You can enable specific providers by setting appropriate environment variables. For example,

# self-hosted
export OLLAMA_URL=http://localhost:11434/v1   # enables the Ollama inference provider
export VLLM_URL=http://localhost:8000/v1   # enables the vLLM inference provider
export TGI_URL=http://localhost:8000/v1   # enables the TGI inference provider

# cloud-hosted requiring API key configuration on the server
export CEREBRAS_API_KEY=your_cerebras_api_key   # enables the Cerebras inference provider
export NVIDIA_API_KEY=your_nvidia_api_key   # enables the NVIDIA inference provider

# vector providers
export MILVUS_URL=http://localhost:19530   # enables the Milvus vector provider
export CHROMADB_URL=http://localhost:8000/v1   # enables the ChromaDB vector provider
export PGVECTOR_DB=llama_stack_db   # enables the PGVector vector provider
export INFINISPAN_URL=http://localhost:11222   # enables the Infinispan vector provider

This distribution comes with a default "llama-guard" shield that can be enabled by setting the SAFETY_MODEL environment variable to point to an appropriate Llama Guard model id. Use llama-stack-client models list to see the list of available models.

Running

See Starting a Llama Stack Server for all the ways to run (uv, container, library, Kubernetes).

Quick start:

uvx --from 'llama-stack[starter]' llama stack run starter

PostgreSQL Storage

By default, the starter distribution uses SQLite. For production, use PostgreSQL:

uvx --from 'llama-stack[starter]' llama stack run starter::run-with-postgres-store.yaml

Required environment variables for PostgreSQL:

Variable	Default
`POSTGRES_HOST`	`localhost`
`POSTGRES_PORT`	`5432`
`POSTGRES_DB`	`llamastack`
`POSTGRES_USER`	`llamastack`
`POSTGRES_PASSWORD`	`llamastack`

Provider Composition​

Inference Providers​

Hosted Providers​

Local/Remote Providers​

Vector IO​

Environment Variables​

Server Configuration​

API Keys for Hosted Providers​

Local Provider Configuration​

Model Configuration​

Vector Database Configuration​

Tool Configuration​

Enabling Providers​

Running​

PostgreSQL Storage​