Elena Samuylova from Evidently AI joins InfoQ to spill the tea on evaluating LLM-powered applications—covering everything from the right testing strategies and monitoring tools to real-world best practices that keep your AI running smoothly in production.
She even dives into the meta-world of using LLMs as judges, exploring how these models can assess each other’s output and help you fine-tune performance and reliability.
Watch on YouTube
Top comments (0)