InfoQ: Why Observability Matters (More!) with AI Applications

#kubernetes #cloud #performance #architecture

Why Observability Matters (More!) with AI Applications

AI apps powered by LLMs are nothing like your run-of-the-mill microservices—they’re non-uniform, costly, and full of surprises. In this InfoQ talk, Sally O’Malley breaks down why observability is the secret sauce to reliable, business-critical AI. You’ll get the exact open-source stack (vLLM, Llama Stack, Prometheus, Tempo, Grafana on Kubernetes) and learn which signals—cost, performance, quality— really matter for RAG, agentic, and multi-turn workflows.

The session features a live demo from spinning up vLLM with llm-d to building ServiceMonitors in Kubernetes and tracing with OTel sidecars. You’ll see GPU-usage dashboards, end-to-end traces, and how to spot decoding vs. prefill patterns. In short, it’s your roadmap to running AI reliably in production with full transparency.

Watch on YouTube

Scale Forem

InfoQ: Why Observability Matters (More!) with AI Applications

Why Observability Matters (More!) with AI Applications

Top comments (0)