Elena Samuylova on LLM Evaluation and LLM as Judge
Elena Samuylova from Evidently AI dives into best practices for evaluating LLM-based applications, covering key tools and techniques for testing, monitoring and ensuring AI models stay reliable and fair. She also explores the emerging idea of using an LLM itself as a “judge” to assess outputs and spot potential issues.
InfoQ has the full podcast and transcript, plus a monthly Software Architects’ Newsletter. They’re also hosting events like Dev Summit Munich and QCon (SF, AI New York, London), offer several ongoing podcasts, and welcome contributions if you want to share your software-dev insights.
Watch on YouTube
Top comments (0)