InfoQ: Why Observability Matters (More!) with AI Applications

#ai #observability #architecture #kubernetes

Why Observability Matters (More!) with AI Applications

Feeling the pain of wrangling LLMs in production? Sally O’Malley shows why AI observability is your new best friend, unpacking the open-source stack you need—vLLM, Llama Stack, Prometheus, Tempo and Grafana on Kubernetes—to keep business-critical AI workloads transparent and reliable. Learn the key signals to track (cost, performance, quality) for RAG, agentic and multi-turn apps, and discover why “prefill vs. decode” changes the game.

In a live demo, she walks you through setting up ServiceMonitors, deploying vLLM with llm-d, adding OTel sidecars for tracing, and building GPU-usage dashboards in Grafana. Stick around for the Q&A where open-source cost, Langfuse and persona-driven analytics come together to help you run AI at scale.

Watch on YouTube

Scale Forem

InfoQ: Why Observability Matters (More!) with AI Applications

Why Observability Matters (More!) with AI Applications

Top comments (0)