Pretrained

Kimi K2 and Moonshot AI's history, avoiding loss spikes during training, the muon optimizer, and data parallelism

What is Pretrained?

10 years after studying at Stanford, two friends have somehow become AI experts. One builds startups, the other studies at Cambridge - together they break down LLMs and machine learning with zero BS and maximum banter.