Llama Stack Observability: Metrics, Traces, and Dashboards with OpenTelemetry
Running an LLM application in production is nothing like running a traditional web service. Responses are non-deterministic. Latency swings wildly with model size and token count. And failures are often silent — a tool call that returns garbage still comes back as a 200 OK. You can stare at your HTTP dashboard all day and have no idea that half your users are getting bad answers.
We recently shipped built-in observability for Llama Stack, powered by OpenTelemetry. Three environment variables, zero code changes, and you get metrics and traces from every layer — HTTP requests, inference calls, tool invocations, vector store operations, all the way down.
This post explains the architecture behind it, walks through a hands-on tutorial, and shows what you can actually see once it's running.
