Pretrained

Kimi's serving architecture, mooncake to offload GPU memory to other chipsets, the ubiquity of vllm, and the growing standard LLM stack

What is Pretrained?

10 years after studying at Stanford, two friends have somehow become AI experts. One builds startups, the other studies at Cambridge - together they break down LLMs and machine learning with zero BS and maximum banter.