{"type":"rich","version":"1.0","provider_name":"Transistor","provider_url":"https://transistor.fm","author_name":"Future of Life Institute Podcast","title":"Why AI Evaluation Science Can't Keep Up (with Carina Prunkl)","html":"<iframe width=\"100%\" height=\"180\" frameborder=\"no\" scrolling=\"no\" seamless src=\"https://share.transistor.fm/e/14615be3\"></iframe>","width":"100%","height":180,"duration":3263,"description":"Carina Prunkl is a researcher at Inria. She joins the podcast to discuss how to assess the capabilities and risks of general-purpose AI. We examine why systems can solve hard coding and math problems yet still fail at simple tasks, why pre-deployment tests often miss real-world behavior, and how faster capability gains can increase misuse risks. The conversation also covers de-skilling, red teaming, layered safeguards, and warning signs that AIs might undermine oversight.LINKS:Carina Prunkl personal websiteCHAPTERS:\n(00:00) Episode Preview\n(01:04) Introducing the report\n(02:10) Jagged frontier capabilities\n(05:29) Formal reasoning progress\n(12:36) Risks and evaluation science\n(19:00) Funding evaluation capacity\n(24:03) Autonomy and de-skilling\n(31:32) Authenticity and AI companions\n(41:00) Defense in depth methods\n(48:34) Loss of control risks\n(53:16) Where to read report\nPRODUCED BY:\nhttps://aipodcast.ing\nSOCIAL LINKS:\nWebsite: https://podcast.futureoflife.org\nTwitter (FLI): https://x.com/FLI_org\nTwitter (Gus): https://x.com/gusdocker\nLinkedIn: https://www.linkedin.com/company/future-of-life-institute/\nYouTube: https://www.youtube.com/channel/UC-rCCy3FQ-GItDimSR9lhzw/\nApple: https://geo.itunes.apple.com/us/podcast/id1170991978\nSpotify: https://open.spotify.com/show/2Op1WO3gwVwCrYHg4eoGyP","thumbnail_url":"https://img.transistorcdn.com/fFhIC-s2qSlHXzmJI7qMGts2WuLwImi4tWmRLH9EdPg/rs:fill:0:0:1/w:400/h:400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81MmU5/MDZjZGQ5OTI0MDc5/YTk2ZTAxYTgwYTNk/M2VlOC5qcGc.webp","thumbnail_width":300,"thumbnail_height":300}