{"type":"rich","version":"1.0","provider_name":"Transistor","provider_url":"https://transistor.fm","author_name":"Daily Paper Cast","title":"X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents","html":"<iframe width=\"100%\" height=\"180\" frameborder=\"no\" scrolling=\"no\" seamless src=\"https://share.transistor.fm/e/600a9634\"></iframe>","width":"100%","height":180,"duration":1258,"description":"\n            🤗 Upvotes: 25 | cs.CR, cs.AI, cs.CL, cs.LG, cs.MA\n\n            Authors:\n            Salman Rahman, Liwei Jiang, James Shiffer, Genglin Liu, Sheriff Issaka, Md Rizwan Parvez, Hamid Palangi, Kai-Wei Chang, Yejin Choi, Saadia Gabriel\n\n            Title:\n            X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents\n\n            Arxiv:\n            http://arxiv.org/abs/2504.13203v1\n\n            Abstract:\n            Multi-turn interactions with language models (LMs) pose critical safety risks, as harmful intent can be strategically spread across exchanges. Yet, the vast majority of prior work has focused on single-turn safety, while adaptability and diversity remain among the key challenges of multi-turn red-teaming. To address these challenges, we present X-Teaming, a scalable framework that systematically explores how seemingly harmless interactions escalate into harmful outcomes and generates corresponding attack scenarios. X-Teaming employs collaborative agents for planning, attack optimization, and verification, achieving state-of-the-art multi-turn jailbreak effectiveness and diversity with success rates up to 98.1% across representative leading open-weight and closed-source models. In particular, X-Teaming achieves a 96.2% attack success rate against the latest Claude 3.7 Sonnet model, which has been considered nearly immune to single-turn attacks. Building on X-Teaming, we introduce XGuard-Train, an open-source multi-turn safety training dataset that is 20x larger than the previous best resource, comprising 30K interactive jailbreaks, designed to enable robust multi-turn safety alignment for LMs. Our work offers essential tools and insights for mitigating sophisticated conversational attacks, advancing the multi-turn safety of LMs.\n            ","thumbnail_url":"https://img.transistorcdn.com/8lOVNnuwhrA3rxrDMv7Osu4j_t1-jORooO6NfGcQhcw/rs:fill:0:0:1/w:400/h:400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81Zjg1/YzRhODczMDU4MmE4/OGMwN2FiNDlmYzI2/MDliMi5qcGVn.webp","thumbnail_width":300,"thumbnail_height":300}