Daily Paper Cast

🤗 Upvotes: 39 | cs.CV, cs.AI

Authors:
Yuran Wang, Bohan Zeng, Chengzhuo Tong, Wenxuan Liu, Yang Shi, Xiaochen Ma, Hao Liang, Yuanxing Zhang, Wentao Zhang

Title:
Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Arxiv:
http://arxiv.org/abs/2512.12675v1

Abstract:
Subject-driven image generation has advanced from single- to multi-subject composition, while neglecting distinction, the ability to identify and generate the correct subject when inputs contain multiple candidates. This limitation restricts effectiveness in complex, realistic visual settings. We propose Scone, a unified understanding-generation method that integrates composition and distinction. Scone enables the understanding expert to act as a semantic bridge, conveying semantic information and guiding the generation expert to preserve subject identity while minimizing interference. A two-stage training scheme first learns composition, then enhances distinction through semantic alignment and attention-based masking. We also introduce SconeEval, a benchmark for evaluating both composition and distinction across diverse scenarios. Experiments demonstrate that Scone outperforms existing open-source models in composition and distinction tasks on two benchmarks. Our model, benchmark, and training data are available at: https://github.com/Ryann-Ran/Scone.

What is Daily Paper Cast?

We update every weekday to discuss highest-voted papers from Huggingface Daily Paper (https://huggingface.co/papers). Both the podcast scripts and audio are generated by AI. Feedback and suggestions are welcome! Email us: dailypapercast.ai@gmail.com

Creator:
Jingwen Liang, 3D ML, https://www.linkedin.com/in/jingwen-liang/
Gengyu Wang, LLM ML, http://wanggengyu.com

Listen on:
Spotify: https://open.spotify.com/show/21nrhmdaA8qoBiH8q03NXL
Apple Podcast: https://podcasts.apple.com/us/podcast/daily-paper-cast/id1777620236

Cover Image by Kawen Kuang https://kawen.art