Why Observability Matters (More!) with AI Applications
Feeling the pain of wrangling LLMs in production? Sally O’Malley shows why AI observability is your new best friend, unpacking the open-source stack you need—vLLM, Llama Stack, Prometheus, Tempo and Grafana on Kubernetes—to keep business-critical AI workloads transparent and reliable. Learn the key signals to track (cost, performance, quality) for RAG, agentic and multi-turn apps, and discover why “prefill vs. decode” changes the game.
In a live demo, she walks you through setting up ServiceMonitors, deploying vLLM with llm-d, adding OTel sidecars for tracing, and building GPU-usage dashboards in Grafana. Stick around for the Q&A where open-source cost, Langfuse and persona-driven analytics come together to help you run AI at scale.
Watch on YouTube
Top comments (0)