Thomas Sohmers joins to discuss dropping out of high school at age 17 to start a chip company, lessons from the successes and failures of past processor architectures, the history of VLIW, and the new AI hardware appliances he and his team are building at Positron AI.
Thomas on X: https://twitter.com/trsohmers
Thomas' Site: https://www.trsohmers.com/
Show Notes
- Welcome Thomas Sohmers (00:01:22)
- Growing Up Around Computers (00:03:13)
- Digging Beneath the Software (00:05:56)
- Learning Python, C, and Arduino C (00:07:05)
- Learning About the Thiel Fellowship (00:07:44)
- Starting Research at MIT at age 14 (00:09:24)
- Dropping out of High School and Starting Thiel Fellowship at age 17 (00:10:36)
- MIT ISN Lab (00:11:09)
- Evaluating ARM Processors for High Performance Computing (00:11:28)
- ARM Calxeda Processor (00:11:38)
- Scaling Out Low Power Processors for Data Center Compute (00:12:27)
- Incorporating REX Computing (00:13:42)
- Facebook and the Open Compute Project (00:14:18)
- Deciding Against Arm (00:14:49)
- ARMv8 (00:15:12)
- Deciding to Design a New Architecture (00:16:26)
- Multiflow (00:18:23)
- Good Architecture Ideas from the Past (00:18:35)
- Thomas' Talk at Stanford (00:18:59)
- RISC vs. CISC Debate (00:19:37)
- SPARC Instruction Set (00:20:04)
- The Importance of History (00:20:58)
- RISC Came Before CISC (00:23:08)
- CDC 6600 (00:23:20)
- Load-Store Architecture (00:23:53)
- IBM System/360 (00:24:02)
- PowerPC (00:24:29)
- VLIW (00:25:02)
- ELI-512 and Josh Fisher (00:25:05)
- Floating Point Systems, Inc. (FPS) (00:26:45)
- Multiflow Compiler (00:26:52)
- Instruction Level Parallelism (00:27:33)
- Intel Itanium (00:28:20)
- Itanium is not a VLIW Architecture (00:29:04)
- Explicitly Parallel Instruction Computer (EPIC) (00:29:22)
- x86 and Pentium (00:30:18)
- Impact of Branch Prediction and Caching on Determinism (00:31:34)
- Why Itanium Failed (00:32:27)
- REX's NEO Architecture (00:35:29)
- Hard Real-Time Determinism (00:35:41)
- Scratchpad Memory (00:35:54)
- Removing Memory Management (TLB, MMU, etc.) (00:36:18)
- ALU, FPU, and Register Files (00:37:14)
- Benefits of Removing Implicit Caching Layers (00:38:30)
- VLIW in Signal Processing (00:39:51)
- VLIW Won in a Silent Way (00:40:49)
- Original Reason for Hardware-Managed Caching (00:41:26)
- Impact of VLIW and Software-Managed Memory on Compile Times (00:42:41)
- LLVM and Sufficiently Advanced Open Source Compilers (00:42:49)
- Apple Transition from PowerPC to x86 to Arm (00:43:31)
- Static Single-Assignment Form (00:44:11)
- Impact of More Powerful Personal Machines on VLIW (00:45:07)
- Software is the Hard Part of New Hardware (00:45:35)
- LLVM Frontends, IR, and Backends (00:46:20)
- Qualcomm Hexagon DSP (00:47:22)
- Paul Sebexen (00:48:08)
- Basic Linear Algebra Subprograms (BLAS) (00:49:21)
- Fast Fourier Transform (FFT) (00:49:33)
- Working on Software in Parallel with Hardware (00:50:00)
- Verilator (00:51:09)
- Cadence Incisive (00:51:18)
- Synthesizing RTL to Netlist Every Day (00:52:07)
- FPGA vs. ASIC Design Flow (00:53:57)
- Open Source Synthesis Tools (00:54:49)
- OpenMPW (00:57:01)
- FGPA Simulation Post Tape-out (00:57:25)
- Xilinx Ultrascale (00:57:48)
- What Happened to the NEO Chips (00:59:02)
- Floating Point Performance Per Watt vs. Nvidia A100 (01:00:11)
- Winding Down REX (01:00:51)
- Positron AI (01:05:11)
- NeurIPS Exhibit (01:05:48)
- 5x Performance Per Dollar over Nvidia H100 (01:06:07)
- Balancing Memory Bandwidth and Compute (01:06:48)
- Software Interface for Inference Accelerator (01:08:15)
- Trained Model File Formats (01:09:13)
- Benefits of Direct Ingestion of Model FIles (01:09:33)
- The Importance of Understanding Customer Pain Points (01:12:33)
- Current Pain Points for Inference (01:14:12)
- Performance Per Dollar (01:15:16)
- Load Balancing Requests in a Hardware Agnostic Manner (01:16:20)
- OpenAI API (Informal) Specification (01:16:47)
- Future Proofing Hardware (01:17:31)
- State Space Models (Hyena, Mamba) (01:18:38)
- Mixture of Experts (MoE) (GPT-4, Gemini, Mixtral) (01:19:33)
- High Bandwidth Memory (HBM) (01:20:32)
- Chip on Wafer on Substrate (CoWos) (01:20:49)
- Keeping up with Thomas and Positron (01:21:34)
More: https://microarch.club/episodes/10
What is Microarch Club?
The art, science, and history of processor design.