Skip to content
LegacySWE

LegacySWE

LegacySWE is the long-horizon coding benchmark for legacy software maintenance and modernization in enterprise systems.

pass@2pass@4
#ModelHarnessScore
1DeepSeek V4 ProTerminus-2
11.0%±6.0%
2GPT-5.5Codex CLI
9.0%±5.5%
2Claude Opus 4.7Claude Code
9.0%±5.5%
2Gemini 3.1 ProTerminus-2
9.0%±5.5%
2GPT-5.5Terminus-2
9.0%±5.0%
6GPT-5.4 MiniCodex CLI
6.0%±4.5%
6Kimi K2.6Terminus-2
6.0%±4.5%
8Claude Opus 4.7Terminus-2
5.0%±4.5%
8Kimi K2.6Kimi CLI
5.0%±4.0%