{"type":"rich","version":"1.0","provider_name":"Transistor","provider_url":"https://transistor.fm","author_name":"START","title":"Jay Ram | Beyond Evals: Build Environments That Make Agents Better","html":"<iframe width=\"100%\" height=\"180\" frameborder=\"no\" scrolling=\"no\" seamless src=\"https://share.transistor.fm/e/b6b582a8\"></iframe>","width":"100%","height":180,"duration":1321,"description":"Jay Ram is Founder & CEO of Hud, the evaluation and RL platform for AI agents. Hud helps startups build RL environments, run fast reward loops, and plug into any RL backend—so teams can cut costs and push last-mile accuracy once they've hit PMF. Before Hud, Jay left a lucrative quant career, shipped an AI prank-calling app that briefly hit #1 on the App Store (≈500k calls), and decided he wanted harder problems and smarter customers. He's a YC W25 alum; Hud is already used by researchers at foundation labs and is expanding into enterprise environments.Jay's catalyst was realizing he didn't want to just talk weekends—he wanted to build. He and his co-founders first tackled computer-use evals for labs. Inside that work, the language shifted: labs asking for \"evals\" really needed environments—places where you design rewards, iterate, and actually improve model behavior. Today, Jay frames Hud as the \"Next.js of RL environments\": opinionated lifecycle, backend-agnostic training, and infra that returns signal fast. Early on, use a foundation model; post-PMF, train your own with SFT/RL—that's where environments matter. Looking ahead, he sees post-training speciation: domain-tuned models for finance, accounting, creative tooling, and more—because teams will own more of their stack again.Key Topics Covered:· What Hud is: tools to set up your agent for RL, define tasks, shape rewards, and plug into RFT/other RL backends.· From evals to environments: why scores measure but rewards improve—and how iteration loops change outcomes.· Where it fits: use foundation models early; post-PMF train your own for cost leverage + last-mile gains.· Design + infra: a new category needs opinionated UX and fast results; why lab researchers use Hud for computer-use evals.· Market timing: the \"DeepSeek moment\" pulled RL from hobbyists into enterprise interest in 2025.· Pre-train vs post-train: scale vs accuracy + domain depth—and why post-training is the real edge.· Future of work: enterprises...","thumbnail_url":"https://img.transistorcdn.com/q-E1hh7K6IS4AfZiNy2p4MYVGcUOO8lQP92h8QbOEOA/rs:fill:0:0:1/w:400/h:400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS80NzVj/MDEzNjkxNjU1N2Uy/NDFhMDQ3M2ZhNWI3/NWY0MS5wbmc.webp","thumbnail_width":300,"thumbnail_height":300}