Starting a Llama Stack Server

uv (recommended)
Container
As a Library
Kubernetes

The fastest way to get started. No global install needed:

uvx --from 'llama-stack[starter]' llama stack run starter

Or if you have a project with llama-stack as a dependency:

uv run llama stack run starter

Run a pre-built container image:

docker run -it \
  -p 8321:8321 \
  -v ~/.llama:/root/.llama \
  -e OLLAMA_URL=http://host.docker.internal:11434 \
  llamastack/distribution-starter

See Building Custom Distributions to create your own image.

Use Llama Stack directly in your Python process without running a server:

from llama_stack.core.library_client import LlamaStackAsLibraryClient

client = LlamaStackAsLibraryClient("starter")
client.initialize()

See Using Llama Stack as a Library for details.

The server runs at http://localhost:8321 by default. Use --port to change it.

Logging

Control log output via environment variables:

# Per-component levels
LLAMA_STACK_LOGGING=server=debug,core=info llama stack run starter

# Global level
LLAMA_STACK_LOGGING=all=debug llama stack run starter

# Log to file
LLAMA_STACK_LOG_FILE=/tmp/llama-stack.log llama stack run starter

Categories: all, core, server, router, inference, safety, tools, client. Levels: debug, info, warning, error, critical.

Logging​

Logging