High Agency: The Podcast for AI Builders

High Agency: The Podcast for AI Builders Trailer Bonus Episode 2 Season 1

Evaluating LLMs the Right Way: Lessons from Hex's Journey

Evaluating LLMs the Right Way: Lessons from Hex's JourneyEvaluating LLMs the Right Way: Lessons from Hex's Journey

00:00
I recently sat down with Bryan Bischof, AI lead at Hex, to dive deep into how they evaluate LLMs to ship reliable AI agents. Hex has deployed AI assistants that can automatically generate SQL queries, transform data, and create visualizations based on natural language questions. While many teams struggle to get value from LLMs in production, Hex has cracked the code.

In this episode, Bryan shares the hard-won lessons they've learned along the way. We discuss why most teams are approaching LLM evaluation wrong and how Hex's unique framework enabled them to ship with confidence. 

Bryan breaks down the key ingredients to Hex's success:
- Choosing the right tools to constrain agent behavior
- Using a reactive DAG to allow humans to course-correct agent plans
- Building granular, user-centric evaluators instead of chasing one "god metric"
- Gating releases on the metrics that matter, not just gaming a score
- Constantly scrutinizing model inputs & outputs to uncover insights

For show notes and a transcript go to:
https://hubs.ly/Q02BdzVP0
-----------------------------------------------------
Humanloop is an Integrated Development Environment for Large Language Models. It enables product teams to develop LLM-based applications that are reliable and scalable. To find out more go to  https://hubs.ly/Q02yV72D0

What is High Agency: The Podcast for AI Builders?

High Agency is the podcast for AI builders. If you’re trying to understand how to successfully build AI products with Large Language Models and Generative AI then this podcast is made for you. Each week we interview leaders at companies building on the frontier who have already succeeded with AI in production. We share their stories, lessons and playbooks so you can build more quickly and with confidence.

AI is moving incredibly fast and no-one is truly an expert yet, High Agency is for people who are learning by doing and will share knowledge through the community.

Where to find us: https://hubs.ly/Q02z2HR40