Skip to main content

Starter Distribution

The llamastack/distribution-starter distribution is a comprehensive, multi-provider distribution that includes most of the available inference providers in Llama Stack. It's designed to be a one-stop solution for developers who want to experiment with different AI providers without having to configure each one individually.

Provider Composition

The starter distribution consists of the following provider configurations:

APIProvider(s)
responsesinline::builtin
datasetioremote::huggingface, inline::localfs
evalinline::builtin
filesinline::localfs
inferenceremote::openai, remote::fireworks, remote::together, remote::ollama, remote::anthropic, remote::gemini, remote::groq, remote::sambanova, remote::vllm, remote::cerebras, remote::llama-openai-compat, remote::nvidia, inline::sentence-transformers
safetyinline::llama-guard
scoringinline::basic, inline::llm-as-judge, inline::braintrust
tool_runtimeremote::brave-search, remote::tavily-search, inline::file-search, remote::model-context-protocol
vector_ioinline::faiss, inline::sqlite-vec, inline::milvus, remote::chromadb, remote::pgvector, remote::qdrant, remote::weaviate, remote::elasticsearch, remote::infinispan

Inference Providers

The starter distribution includes a comprehensive set of inference providers:

Hosted Providers

  • OpenAI: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings - provider ID: openai - reference documentation: openai
  • Fireworks: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings - provider ID: fireworks - reference documentation: fireworks
  • Together: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and embeddings - provider ID: together - reference documentation: together
  • Anthropic: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings - provider ID: anthropic - reference documentation: anthropic
  • Gemini: Gemini 1.5, 2.0, 2.5 models and text embeddings - provider ID: gemini - reference documentation: gemini
  • Groq: Fast Llama models (3.1, 3.2, 3.3, 4 Scout, 4 Maverick) - provider ID: groq - reference documentation: groq
  • SambaNova: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models - provider ID: sambanova - reference documentation: sambanova
  • Cerebras: Cerebras AI models - provider ID: cerebras - reference documentation: cerebras
  • NVIDIA: NVIDIA NIM - provider ID: nvidia - reference documentation: nvidia
  • Bedrock: AWS Bedrock models - provider ID: bedrock - reference documentation: bedrock

Local/Remote Providers

All providers are disabled by default. So you need to enable them by setting the environment variables.

Vector IO

The starter distribution includes a comprehensive set of vector IO providers:

  • FAISS: Local FAISS vector store - enabled by default - provider ID: faiss
  • SQLite: Local SQLite vector store - disabled by default - provider ID: sqlite-vec
  • ChromaDB: Remote ChromaDB vector store - disabled by default - provider ID: chromadb
  • PGVector: PostgreSQL vector store - disabled by default - provider ID: pgvector
  • Milvus: Milvus vector store - disabled by default - provider ID: milvus
  • Qdrant: Qdrant vector store - disabled by default - provider ID: qdrant
  • Weaviate: Weaviate vector store - disabled by default - provider ID: weaviate
  • Elasticsearch: Elasticsearch vector store - disabled by default - provider ID: elasticsearch
  • Infinispan: Infinispan vector store - disabled by default - provider ID: infinispan

Environment Variables

The following environment variables can be configured:

Server Configuration

  • LLAMA_STACK_PORT: Port for the Llama Stack distribution server (default: 8321)

API Keys for Hosted Providers

  • OPENAI_API_KEY: OpenAI API key
  • FIREWORKS_API_KEY: Fireworks API key
  • TOGETHER_API_KEY: Together API key
  • ANTHROPIC_API_KEY: Anthropic API key
  • GEMINI_API_KEY: Google Gemini API key
  • GROQ_API_KEY: Groq API key
  • SAMBANOVA_API_KEY: SambaNova API key
  • CEREBRAS_API_KEY: Cerebras API key
  • LLAMA_API_KEY: Llama API key
  • NVIDIA_API_KEY: NVIDIA API key
  • HF_API_TOKEN: HuggingFace API token

Local Provider Configuration

  • OLLAMA_URL: Ollama server URL (default: http://localhost:11434)
  • VLLM_URL: vLLM server URL (default: http://localhost:8000/v1)
  • VLLM_MAX_TOKENS: vLLM max tokens (default: 4096)
  • VLLM_API_TOKEN: vLLM API token (default: fake)
  • VLLM_TLS_VERIFY: vLLM TLS verification (default: true)
  • TGI_URL: TGI server URL

Model Configuration

  • INFERENCE_MODEL: HuggingFace model for serverless inference
  • INFERENCE_ENDPOINT_NAME: HuggingFace endpoint name

Vector Database Configuration

  • SQLITE_STORE_DIR: SQLite store directory (default: ~/.llama/distributions/starter)
  • ENABLE_SQLITE_VEC: Enable SQLite vector provider
  • ENABLE_CHROMADB: Enable ChromaDB provider
  • ENABLE_PGVECTOR: Enable PGVector provider
  • CHROMADB_URL: ChromaDB server URL
  • PGVECTOR_HOST: PGVector host (default: localhost)
  • PGVECTOR_PORT: PGVector port (default: 5432)
  • PGVECTOR_DB: PGVector database name
  • PGVECTOR_USER: PGVector username
  • PGVECTOR_PASSWORD: PGVector password
  • MILVUS_URL: Milvus server URL
  • QDRANT_URL: Qdrant server URL
  • WEAVIATE_CLUSTER_URL: Weaviate cluster URL
  • ELASTICSEARCH_URL: Elasticsearch server URL (default: localhost:9200)
  • ELASTICSEARCH_API_KEY: Elasticsearch API key
  • INFINISPAN_URL: Infinispan server URL (default: http://localhost:11222)
  • INFINISPAN_USERNAME: Infinispan authentication username (default: admin)
  • INFINISPAN_PASSWORD: Infinispan authentication password

Tool Configuration

  • BRAVE_SEARCH_API_KEY: Brave Search API key
  • TAVILY_SEARCH_API_KEY: Tavily Search API key

Enabling Providers

You can enable specific providers by setting appropriate environment variables. For example,

# self-hosted
export OLLAMA_URL=http://localhost:11434/v1 # enables the Ollama inference provider
export VLLM_URL=http://localhost:8000/v1 # enables the vLLM inference provider
export TGI_URL=http://localhost:8000/v1 # enables the TGI inference provider

# cloud-hosted requiring API key configuration on the server
export CEREBRAS_API_KEY=your_cerebras_api_key # enables the Cerebras inference provider
export NVIDIA_API_KEY=your_nvidia_api_key # enables the NVIDIA inference provider

# vector providers
export MILVUS_URL=http://localhost:19530 # enables the Milvus vector provider
export CHROMADB_URL=http://localhost:8000/v1 # enables the ChromaDB vector provider
export PGVECTOR_DB=llama_stack_db # enables the PGVector vector provider
export INFINISPAN_URL=http://localhost:11222 # enables the Infinispan vector provider

This distribution comes with a default "llama-guard" shield that can be enabled by setting the SAFETY_MODEL environment variable to point to an appropriate Llama Guard model id. Use llama-stack-client models list to see the list of available models.

Running

See Starting a Llama Stack Server for all the ways to run (uv, container, library, Kubernetes).

Quick start:

uvx --from 'llama-stack[starter]' llama stack run starter

PostgreSQL Storage

By default, the starter distribution uses SQLite. For production, use PostgreSQL:

uvx --from 'llama-stack[starter]' llama stack run starter::run-with-postgres-store.yaml

Required environment variables for PostgreSQL:

VariableDefault
POSTGRES_HOSTlocalhost
POSTGRES_PORT5432
POSTGRES_DBllamastack
POSTGRES_USERllamastack
POSTGRES_PASSWORDllamastack