ML Safety Report

This week, we take a look at interpretability used on a Go-playing neural network, glitchy tokens and the opinions and actions of top AI labs and entrepreneurs.


  • Stanford's AI100 prize is for people to write essays about how AI will affect our lives, work, and society at large. The applications close at the end of this month: 
  • You can apply for a paid three-month fellowship with AI Safety Info to write answers and summaries for alignment questions and topics: 
  • The Future of Life Institute has open rolling applications for remote, full-time and interns: 
  • Similarly, the Epoch team has an expression of interest to join their talented research team: 
  • You can apply for a postdoc / research scientist position in language model alignment at New York University with Sam Bowman and his team.  
  • Of course, you can join our AI governance hackathon at


What is ML Safety Report?

A weekly podcast updating you with the latest research in AI and machine learning safety from people such as DeepMind, Anthropic, and MIRI.