ML Safety Report

Hopes and fears of the current AI safety paradigm, GPU performance predictions and popular literature on why machines will never rule the world. Welcome to the ML & AI safety Update!

Show Notes

Join the AGI safety fundamentals course, starting next year! 
Altera, EA adjacent organization, is looking for members: 
Show your interest in joining the Machine Learning Alignment Bootcamp: 
[28 dec] Join workshops to learn about alignment as a career:  
[Today!] Global Challenges Project workshops in Oxford:  
[Today!] AI Testing Hackathon, see our livestream tonight:  

0:00 Intro
0:12 Hopes and fears of current AI safety
2:22 GPU forecasting and AGI?
3:45 Why Machines Will Never Rule the World
4:58 Other news
6:02 🎄 Opportunities

Karnofsky on hopes for AI safety with current methods, 1) digital neuroscience, 2) limited AI and 3) AI checks and balances: 
Lance Armstrong, King Lear, Lab mice, first contact
Christiano’s reminder that AI alignment is distinct from applied alignment:
Shlegeris’ RLHF critique:
Steiner, RLHF / IDA / Debate does not solve outer alignment, showcasing the left turn view:
EpochAI’s prediction of GPU performance, 2027-2035 GPU progress stop, cores and transistors:
Saba’s review of Keith’s “Machines Will Never Rule the World”:
Steve Byrnes’ research update: 
Discovering latent knowledge in language models: 
Eliciting latent knowledge problem:
Finite factored sets as replacing causal graphs: 
Binary correlating variables:
PIBBBS updates:
Model editing using task vector arithmetic:

What is ML Safety Report?

A weekly podcast updating you with the latest research in AI and machine learning safety from people such as DeepMind, Anthropic, and MIRI.