{"type":"rich","version":"1.0","provider_name":"Transistor","provider_url":"https://transistor.fm","author_name":"Daily Paper Cast","title":"What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards","html":"<iframe width=\"100%\" height=\"180\" frameborder=\"no\" scrolling=\"no\" seamless src=\"https://share.transistor.fm/e/8848d4b9\"></iframe>","width":"100%","height":180,"duration":1335,"description":"\n            🤗 Upvotes: 41 | cs.CV\n\n            Authors:\nMinh-Quan Le, Yuanzhi Zhu, Vicky Kalogeiton, Dimitris Samaras\n\n            Title:\nWhat about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards\n\n            Arxiv:\nhttp://arxiv.org/abs/2512.00425v1\n\n            Abstract:\nRecent video diffusion models can synthesize visually compelling clips, yet often violate basic physical laws-objects float, accelerations drift, and collisions behave inconsistently-revealing a persistent gap between visual realism and physical realism. We propose $\\texttt{NewtonRewards}$, the first physics-grounded post-training framework for video generation based on $\\textit{verifiable rewards}$. Instead of relying on human or VLM feedback, $\\texttt{NewtonRewards}$ extracts $\\textit{measurable proxies}$ from generated videos using frozen utility models: optical flow serves as a proxy for velocity, while high-level appearance features serve as a proxy for mass. These proxies enable explicit enforcement of Newtonian structure through two complementary rewards: a Newtonian kinematic constraint enforcing constant-acceleration dynamics, and a mass conservation reward preventing trivial, degenerate solutions. We evaluate $\\texttt{NewtonRewards}$ on five Newtonian Motion Primitives (free fall, horizontal/parabolic throw, and ramp sliding down/up) using our newly constructed large-scale benchmark, $\\texttt{NewtonBench-60K}$. Across all primitives in visual and physics metrics, $\\texttt{NewtonRewards}$ consistently improves physical plausibility, motion smoothness, and temporal coherence over prior post-training methods. It further maintains strong performance under out-of-distribution shifts in height, speed, and friction. Our results show that physics-grounded verifiable rewards offer a scalable path toward physics-aware video generation.\n            ","thumbnail_url":"https://img.transistorcdn.com/8lOVNnuwhrA3rxrDMv7Osu4j_t1-jORooO6NfGcQhcw/rs:fill:0:0:1/w:400/h:400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81Zjg1/YzRhODczMDU4MmE4/OGMwN2FiNDlmYzI2/MDliMi5qcGVn.webp","thumbnail_width":300,"thumbnail_height":300}