{"type":"rich","version":"1.0","provider_name":"Transistor","provider_url":"https://transistor.fm","author_name":"Pretrained","title":"Evaluation metrics for reasoning models","html":"<iframe width=\"100%\" height=\"180\" frameborder=\"no\" scrolling=\"no\" seamless src=\"https://share.transistor.fm/e/3a00c6e0\"></iframe>","width":"100%","height":180,"duration":1965,"description":"Evaluating models on benchmarks, passing a model vibe check, formal reasoning to synthesize datasets, and what type of datasets researchers prefer","thumbnail_url":"https://img.transistorcdn.com/8veBHYJ1tFjtWlv9ET3YGaLijqZK5MYE6tVoUbwgKaw/rs:fill:0:0:1/w:400/h:400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS8yYTRi/YTQ5Zjk4ZjIzMmU2/YzRiMWZjN2E5ZmJk/NzNjNi5wbmc.webp","thumbnail_width":300,"thumbnail_height":300}