ML Safety Report

We look at Bing going bananas, see that certification mechanisms can be exploited and that scaling oversight seems like a solvable problem from our latest hackathon results.

Opportunities
Sources
Evan Hubinger aggregate list: https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT/bing-chat-is-blatantly-aggressively-misaligned 

What is ML Safety Report?

A weekly podcast updating you with the latest research in AI and machine learning safety from people such as DeepMind, Anthropic, and MIRI.