Scale Forem

Scale YouTube
Scale YouTube

Posted on

InfoQ: Elena Samuylova on Large Language Model (LLM) Based Application Evaluation and LLM as a Judge

Elena Samuylova on LLM Evaluation and LLM as Judge

Elena Samuylova from Evidently AI dives into best practices for evaluating LLM-based applications, covering key tools and techniques for testing, monitoring and ensuring AI models stay reliable and fair. She also explores the emerging idea of using an LLM itself as a “judge” to assess outputs and spot potential issues.

InfoQ has the full podcast and transcript, plus a monthly Software Architects’ Newsletter. They’re also hosting events like Dev Summit Munich and QCon (SF, AI New York, London), offer several ongoing podcasts, and welcome contributions if you want to share your software-dev insights.

Watch on YouTube

Top comments (0)