Daily Paper Cast

🤗 Upvotes: 51 | cs.CV, cs.AI

Authors:
Shuai Wang, Zhi Tian, Weilin Huang, Limin Wang

Title:
DDT: Decoupled Diffusion Transformer

Arxiv:
http://arxiv.org/abs/2504.05741v2

Abstract:
Diffusion transformers have demonstrated remarkable generation quality, albeit requiring longer training iterations and numerous inference steps. In each denoising step, diffusion transformers encode the noisy inputs to extract the lower-frequency semantic component and then decode the higher frequency with identical modules. This scheme creates an inherent optimization dilemma: encoding low-frequency semantics necessitates reducing high-frequency components, creating tension between semantic encoding and high-frequency decoding. To resolve this challenge, we propose a new \textbf{\color{ddt}D}ecoupled \textbf{\color{ddt}D}iffusion \textbf{\color{ddt}T}ransformer~(\textbf{\color{ddt}DDT}), with a decoupled design of a dedicated condition encoder for semantic extraction alongside a specialized velocity decoder. Our experiments reveal that a more substantial encoder yields performance improvements as model size increases. For ImageNet $256\times256$, Our DDT-XL/2 achieves a new state-of-the-art performance of {1.31 FID}~(nearly $4\times$ faster training convergence compared to previous diffusion transformers). For ImageNet $512\times512$, Our DDT-XL/2 achieves a new state-of-the-art FID of 1.28. Additionally, as a beneficial by-product, our decoupled architecture enhances inference speed by enabling the sharing self-condition between adjacent denoising steps. To minimize performance degradation, we propose a novel statistical dynamic programming approach to identify optimal sharing strategies.

What is Daily Paper Cast?

We update every weekday to discuss highest-voted papers from Huggingface Daily Paper (https://huggingface.co/papers). Both the podcast scripts and audio are generated by AI. Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com

Creator:
Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/
Gengyu Wang, LLM ML, http://wanggengyu.com

Listen on:
Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL
Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236

Cover Image by Kawen Kuang https://kawen.art