Why Observability Matters (More!) with AI Applications
AI apps aren’t your grandma’s microservices—they’re chatty, costly, and non-uniform. In this InfoQ talk, Sally O’Malley walks senior engineers through a full open-source observability stack—vLLM, Llama Stack, Prometheus, Tempo and Grafana on Kubernetes—to keep RAG, agentic and multi-turn systems transparent, reliable and cost-aware.
You’ll see a live demo of everything from service monitors and OTel sidecars to GPU-usage dashboards, plus deep dives on key AI signals (performance, cost, quality) and best practices for tracing LLM serving patterns like prefill vs. decode. It’s everything you need to go from AI research to business-critical production.
Watch on YouTube
Top comments (0)