Darryl and I discuss his background, how he became interested in machine learning, and a project we are currently working on investigating the penalization of polysemanticity during the training of neural networks.
Check out a diagram of the decoder task used for our research!
01:46 - Interview begins02:14 - Supernovae classification08:58 - Penalizing polysemanticity20:58 - Our "toy model"30:06 - Task description32:47 - Addressing hurdles39:20 - Lessons learned
Links to all articles/papers which are mentioned throughout the episode can be found below, in order of their appearance.
Zooniverse
BlueDot Impact
AI Safety Support
Zoom In: An Introduction to Circuits
MNIST dataset on PapersWithCode
Clusterability in Neural Networks
CIFAR-10 dataset
Effective Altruism Global
CLIP (blog post)
Long Term Future Fund
Engineering Monosemanticity in Toy Models
The Into AI Safety podcast aims to make it easier for everyone, regardless of background, to get meaningfully involved with the conversations surrounding the rules and regulations which should govern the research, development, deployment, and use of the technologies encompassed by the term "artificial intelligence" or "AI"
For better formatted show notes, additional resources, and more, go to https://kairos.fm/intoaisafety/