Scale Forem

Scale YouTube
Scale YouTube

Posted on

InfoQ: Why Observability Matters (More!) with AI Applications

Why Observability Matters (More!) with AI Applications

Feeling the pain of wrangling LLMs in production? Sally O’Malley shows why AI observability is your new best friend, unpacking the open-source stack you need—vLLM, Llama Stack, Prometheus, Tempo and Grafana on Kubernetes—to keep business-critical AI workloads transparent and reliable. Learn the key signals to track (cost, performance, quality) for RAG, agentic and multi-turn apps, and discover why “prefill vs. decode” changes the game.

In a live demo, she walks you through setting up ServiceMonitors, deploying vLLM with llm-d, adding OTel sidecars for tracing, and building GPU-usage dashboards in Grafana. Stick around for the Q&A where open-source cost, Langfuse and persona-driven analytics come together to help you run AI at scale.

Watch on YouTube

Top comments (0)