Scale Forem

Scale YouTube
Scale YouTube

Posted on

InfoQ: Elena Samuylova on Large Language Model (LLM) Based Application Evaluation and LLM as a Judge

Elena Samuylova from Evidently AI joins InfoQ to spill the tea on evaluating LLM-powered applications—covering everything from the right testing strategies and monitoring tools to real-world best practices that keep your AI running smoothly in production.

She even dives into the meta-world of using LLMs as judges, exploring how these models can assess each other’s output and help you fine-tune performance and reliability.

Watch on YouTube

Top comments (0)