Many SWE-bench-Passing PRs would not be merged / 许多SWE-bench-Passing PR不会合并

📰 2026-03-12 06:30 更新

🔸 Many SWE-bench-Passing PRs would not be merged / 许多SWE-bench-Passing PR不会合并

🔗 Many SWE-bench-Passing PRs would not be merged
🔥 33 points

原文:
Summary: We find that roughly half of test-passing SWE-bench Verified PRs written by mid-2024 to mid/late-2025 agents would not be merged into main by repo maintainers, even after adjusting for noise in maintainer merge decisions. Since the agents are not given a chance to iterate on their solution in response to feedback the way a human developer would, we do not claim that this represents a fundamental capability limitation. Rather, our results indicate that a naive interpretation of benchm…

译文:
摘要:我们发现,即使在调整了维护者合并决策中的噪音后, 2024年中期至2025年中期/晚期编写的大约一半的测试通过SWE-bench验证PR也不会被存储库维护者合并到主PR中。由于代理没有机会像人类开发人员那样响应反馈来迭代他们的解决方案,因此我们并不认为这代表了基本的能力限制。相反,我们的结果 表明对工作台的天真解释……


自动更新 · 正文抓取 · 双语翻译

Leave a Comment