One post tagged with "monitoring"

Llama Stack Observability: Metrics, Traces, and Dashboards with OpenTelemetry

March 30, 2026 · 7 min read

Running an LLM application in production is nothing like running a traditional web service. Responses are non-deterministic. Latency swings wildly with model size and token count. And failures are often silent — a tool call that returns garbage still comes back as a 200 OK. You can stare at your HTTP dashboard all day and have no idea that half your users are getting bad answers.

We recently shipped built-in observability for Llama Stack, powered by OpenTelemetry. Three environment variables, zero code changes, and you get metrics and traces from every layer — HTTP requests, inference calls, tool invocations, vector store operations, all the way down.

This post explains the architecture behind it, walks through a hands-on tutorial, and shows what you can actually see once it's running.