SWE Bench Multilingual

Scores Over Time

Individual benchmark scores plotted by date.

SWE Bench Multilingual - Benchmark Leaderboard & Model Performance | AI Stats

Models Using This Benchmark

Organisation	Model	Reported	Top Score	Info	Self Reported	Source
Anthropic	Claude Mythos Preview	07 Apr 2026	87.30%	-	Yes	Source
Anthropic	Claude Opus 4.7	16 Apr 2026	80.50%	-	Yes	Source
Qwen	Qwen 3.7 Max	21 May 2026	78.30%	Internal agent scaffold	Yes	Source
Anthropic	Claude Opus 4.6	05 Feb 2026	77.83%	-	Yes	Source
MiniMax	MiniMax M2.7	18 Mar 2026	76.50%	-	Yes	Source
Anthropic	Claude Opus 4.5	24 Nov 2025	76.20%	Avg@5	Yes	Source
Anthropic	Claude Sonnet 4.6	17 Feb 2026	75.90%	-	Yes	Source
Qwen	Qwen 3.6 Plus	01 Apr 2026	73.80%	-	Yes	Source
Moonshot	Kimi K2.5	27 Jan 2026	73%	-	Yes	Source
MiniMax	MiniMax M2.1	23 Dec 2025	72.50%	Outperforms Claude Sonnet 4.5 (68%) and DeepSeek V3.2 (70.2%)	Yes	Source
ByteDance	Seed 2.0 Pro	14 Feb 2026	71.70%	Seed2 official benchmark table \| SWE Multilingual	Yes	Source
Xiaomi	MiMo V2 Pro	18 Mar 2026	71.70%	-	Yes	Source
Xiaomi	MiMo V2 TTS	18 Mar 2026	71.70%	inferred modality/version alias from mimo-v2-pro	Yes	Source
Xiaomi	MiMo V2 Flash	16 Dec 2025	71.70%	-	Yes	Source
Qwen	Qwen 3.5 397B A17B	16 Feb 2026	69.30%	-	Yes	Source
z.AI	GLM 4.7	22 Dec 2025	66.70%	-	Yes	Source
ByteDance	Seed 2.0 Lite	14 Feb 2026	64.40%	Seed2 official benchmark table \| SWE Multilingual	Yes	Source
Moonshot	Kimi K2 Thinking	06 Nov 2025	61.10%	inferred alias from kimi-k2-thinking-0905	Yes	Source
DeepSeek	DeepSeek OCR 2	-	57.90%	inferred family alias from deepseek-v3.2-exp (score=0.3809; benches=14)	Yes	Source
DeepSeek	DeepSeek V3.2 Exp	29 Sept 2025	57.90%	-	Yes	Source
MiniMax	MiniMax M2 Her	24 Jan 2026	56.50%	inferred modality/version alias from minimax-m2	Yes	-
MiniMax	MiniMax M2	27 Oct 2025	56.50%	-	Yes	-
Qwen	Qwen 3 Coder 480B A35B Instruct	-	54.70%	-	Yes	Source
DeepSeek	DeepSeek V3.1	21 Aug 2025	54.50%	Evaluated with internal code agent framework	Yes	Source
DeepSeek	DeepSeek V3.1 Terminus	22 Sept 2025	54.50%	inferred alias from deepseek-v3.1	Yes	Source
Moonshot	Kimi K2 (2025-09-05)	05 Sept 2025	47.30%	Single Attempt (Acc)	Yes	Source
Nvidia	Nemotron 3 Super	11 Mar 2026	45.78%	-	Yes	Source

Score Range

Leading Model

Recorded Results

Average Score

Score Range

Leading Model

Models Using This Benchmark