Starting a Llama Stack Server
- uv (recommended)
- Container
- As a Library
- Kubernetes
The fastest way to get started. No global install needed:
uvx --from 'llama-stack[starter]' llama stack run starter
Or if you have a project with llama-stack as a dependency:
uv run llama stack run starter
Run a pre-built container image:
docker run -it \
-p 8321:8321 \
-v ~/.llama:/root/.llama \
-e OLLAMA_URL=http://host.docker.internal:11434 \
llamastack/distribution-starter
See Building Custom Distributions to create your own image.
Use Llama Stack directly in your Python process without running a server:
from llama_stack.core.library_client import LlamaStackAsLibraryClient
client = LlamaStackAsLibraryClient("starter")
client.initialize()
See Using Llama Stack as a Library for details.
Deploy the container image to a Kubernetes cluster. See the Kubernetes Deployment Guide.
The server runs at http://localhost:8321 by default. Use --port to change it.
Logging
Control log output via environment variables:
# Per-component levels
LLAMA_STACK_LOGGING=server=debug,core=info llama stack run starter
# Global level
LLAMA_STACK_LOGGING=all=debug llama stack run starter
# Log to file
LLAMA_STACK_LOG_FILE=/tmp/llama-stack.log llama stack run starter
Categories: all, core, server, router, inference, safety, tools, client.
Levels: debug, info, warning, error, critical.