ML Safety Report

Hopes and fears of the current AI safety paradigm, GPU performance predictions and popular literature on why machines will never rule the world. Welcome to the ML & AI safety Update!

Show Notes

Opportunities
Join the AGI safety fundamentals course, starting next year! https://ais.pub/agisf2 
Altera, EA adjacent organization, is looking for members: https://ais.pub/astera 
Show your interest in joining the Machine Learning Alignment Bootcamp: https://ais.pub/mlab 
[28 dec] Join workshops to learn about alignment as a career: https://ais.pub/aisworkshops  
[Today!] Global Challenges Project workshops in Oxford: https://ais.pub/gcp  
[Today!] AI Testing Hackathon, see our livestream tonight: https://ais.pub/jamlive  

0:00 Intro
0:12 Hopes and fears of current AI safety
2:22 GPU forecasting and AGI?
3:45 Why Machines Will Never Rule the World
4:58 Other news
6:02 🎄 Opportunities

Sources:
Karnofsky on hopes for AI safety with current methods, 1) digital neuroscience, 2) limited AI and 3) AI checks and balances: https://www.alignmentforum.org/posts/7BWmLhFtqzqEPs8d5/high-level-hopes-for-ai-alignment 
Lance Armstrong, King Lear, Lab mice, first contact
Christiano’s reminder that AI alignment is distinct from applied alignment: https://www.alignmentforum.org/posts/Hw26MrLuhGWH7kBLm/ai-alignment-is-distinct-from-its-near-term-applications
Shlegeris’ RLHF critique: https://www.alignmentforum.org/posts/NG6FrXgmqPd5Wn3mh/trying-to-disambiguate-different-questions-about-whether
Steiner, RLHF / IDA / Debate does not solve outer alignment, showcasing the left turn view: https://www.alignmentforum.org/posts/6YNZt5xbBT5dJXknC/take-9-no-rlhf-ida-debate-doesn-t-solve-outer-alignment
EpochAI’s prediction of GPU performance, 2027-2035 GPU progress stop, cores and transistors: https://epochai.org/blog/predicting-gpu-performance
Saba’s review of Keith’s “Machines Will Never Rule the World”: https://www.youtube.com/watch?v=IMnWAuoucjo
Steve Byrnes’ research update: https://www.alignmentforum.org/posts/qusBXzCpxijTudvBB/my-agi-safety-research-2022-review-23-plans 
Discovering latent knowledge in language models: https://arxiv.org/pdf/2212.03827.pdf 
Eliciting latent knowledge problem: https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit
Finite factored sets as replacing causal graphs: https://www.alignmentforum.org/posts/PfcQguFpT8CDHcozj/finite-factored-sets-in-pictures-6 
Binary correlating variables: https://www.alignmentforum.org/posts/N5Jm6Nj4HkNKySA5Z/finite-factored-sets#2e__Two_Binary_Variables__Pearl_
PIBBBS updates: https://www.alignmentforum.org/posts/gbeyjALdjdoCGayc6/reflections-on-the-pibbss-fellowship-2022#Overview_of_main_updates
Model editing using task vector arithmetic: https://arxiv.org/pdf/2212.04089.pdf

What is ML Safety Report?

A weekly podcast updating you with the latest research in AI and machine learning safety from people such as DeepMind, Anthropic, and MIRI.