In her InfoQ talk, Sally O’Malley explains why running AI apps in production is a totally different ballgame—expensive, non-uniform and more unpredictable than classic microservices. She makes the case that observability is your secret weapon, letting you track the three must-have signals—cost, performance and quality—across RAG, agentic and multi-turn workflows.
Then she jumps into a live Kubernetes demo, showing you how to stitch together vLLM, Llama Stack, Prometheus, Tempo and Grafana (plus OTel sidecars and llm-d) to build an end-to-end monitoring solution. By the end, you’ll know exactly which dashboards and traces to set up so your AI workloads stay reliable, transparent and cost-effective.
Watch on YouTube
Top comments (0)