{"type":"rich","version":"1.0","provider_name":"Transistor","provider_url":"https://transistor.fm","author_name":"Daily Paper Cast","title":"Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference","html":"<iframe width=\"100%\" height=\"180\" frameborder=\"no\" scrolling=\"no\" seamless src=\"https://share.transistor.fm/e/c45684dd\"></iframe>","width":"100%","height":180,"duration":1269,"description":"\n            🤗 Upvotes: 48 | cs.AI, cs.CL, cs.LG\n\n            Authors:\n            Wenbo Pan, Shujie Liu, Chin-Yew Lin, Jingying Zeng, Xianfeng Tang, Xiangyang Zhou, Yan Lu, Xiaohua Jia\n\n            Title:\n            Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference\n\n            Arxiv:\n            http://arxiv.org/abs/2606.05922v2\n\n            Abstract:\n            AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Specifically, RHO selects a diverse coreset of challenging tasks from past trajectories and re-solves them in parallel. The agent analyzes these rollouts using self-validation and self-consistency, then generates candidate harness updates and selects the most effective one by its own pairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate on SWE-Bench Pro from 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent's behavior patterns and sustains higher accuracy during long-horizon sessions.\n            ","thumbnail_url":"https://img.transistorcdn.com/8lOVNnuwhrA3rxrDMv7Osu4j_t1-jORooO6NfGcQhcw/rs:fill:0:0:1/w:400/h:400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS81Zjg1/YzRhODczMDU4MmE4/OGMwN2FiNDlmYzI2/MDliMi5qcGVn.webp","thumbnail_width":300,"thumbnail_height":300}