{"type":"rich","version":"1.0","provider_name":"Transistor","provider_url":"https://transistor.fm","author_name":"Crazy Wisdom","title":"Episode #425: Agents, Evals, and the Future of AI: A Pragmatic Take with Christopher Canal","html":"<iframe width=\"100%\" height=\"180\" frameborder=\"no\" scrolling=\"no\" seamless src=\"https://share.transistor.fm/e/35720770\"></iframe>","width":"100%","height":180,"duration":2638,"description":"In this episode of Crazy Wisdom, Stewart Alsop welcomes Christopher Canal, co-founder of Equistamp, for a deep discussion on the current state of AI evaluations (evals), the rise of agents, and the safety challenges surrounding large language models (LLMs). Christopher breaks down how LLMs function, the significance of scaffolding for AI agents, and the complexities of running evals without data leakage. The conversation covers the risks associated with AI agents being used for malicious purposes, the performance limitations of long time horizon tasks, and the murky realm of interpretability in neural networks. Additionally, Christopher shares how Equistamp aims to offer third-party evaluations to combat principal-agent dilemmas in the industry. For more about Equistamp's work, visit Equistamp.com to explore their evaluation tools and consulting services tailored for AI and safety innovation.Check out this GPT we trained on the conversation!Timestamps00:00 Introduction and Guest Welcome00:13 The Importance of Evals in AI01:32 Understanding AI Agents04:02 Challenges and Risks of AI Agents07:56 Future of AI Models and Competence16:39 The Concept of Consciousness in AI19:33 Current State of Evals and Data Leakage24:30 Defining Competence in AI31:26 Equistamp and AI Safety42:12 Conclusion and Contact InformationKey InsightsThe Importance of Evals in AI Development: Christopher Canal emphasizes that evaluations (evals) are crucial for measuring AI models' capabilities and potential risks. He highlights the uncertainty surrounding AI's trajectory and the need to accurately assess when AI systems outperform humans at specific tasks to guide responsible adoption. Without robust evals, companies risk overestimating AI's competence due to data leakage and flawed benchmarks.The Role of Scaffolding in AI Agents: The conversation distinguishes between large language models (LLMs) and agents, with Christopher defining agents as systems operating within a feedback loop to...","thumbnail_url":"https://img.transistorcdn.com/UZbrDrlO5VTfDNcq188THwbv0T09vcmLyzx3BcPI9bs/rs:fill:0:0:1/w:400/h:400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81Y2Rj/OGFiMTYyMGFkNTM5/N2NjOWI2MWM5YzQ1/YTc2Ny5qcGc.webp","thumbnail_width":300,"thumbnail_height":300}