Scale Forem

Scale YouTube
Scale YouTube

Posted on

InfoQ: Why Observability Matters (More!) with AI Applications

Why Observability Matters (More!) with AI Applications dives into why running LLM-powered apps in production is a whole different beast than your average microservice—think non-uniform workloads, sky-high costs, and unpredictable behavior. Sally O’Malley walks you through an open-source stack (vLLM, Llama Stack, Prometheus, Tempo, Grafana) on Kubernetes, showing how to wire up monitoring, tracing, and dashboards with a live demo.

Along the way you’ll learn the three must-track signals—performance, cost, and quality—for everything from RAG to agentic pipelines, see GPU usage and vLLM metrics in Grafana, and pick up tips on ServiceMonitors, OTel sidecars, llm-d, and more. All capped off with a Q&A on open-source costs and actionable analytics for different roles.

Watch on YouTube

Top comments (0)