Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Claude Mythos Preview | 07 Apr 2026 | 94.60% | - | Yes | Source | |
| GPT 5.4 Pro | 05 Mar 2026 | 94.40% | - | Yes | Source | |
| Gemini 3.1 Pro Preview | 19 Feb 2026 | 94.30% | No Tools | Yes | - | |
| Claude Opus 4.7 | 16 Apr 2026 | 94.20% | - | Yes | Source | |
| Gemini 3 Pro Preview | 18 Nov 2025 | 93.80% | Deep Think, Tools Off | Yes | Source | |
| GPT 5.2 Pro | 11 Dec 2025 | 93.20% | No Tools | Yes | Source | |
| GPT 5.4 | 05 Mar 2026 | 92.80% | - | Yes | Source | |
| GPT 5.2 | 11 Dec 2025 | 92.40% | No Tools | Yes | Source | |
| Claude Opus 4.6 | 05 Feb 2026 | 91.30% | - | Yes | Source | |
| Grok 4.20 | 17 Feb 2026 | 91.10% | Artificial Analysis structured model metrics | No | Source | |
| Claude Sonnet 4.6 | 17 Feb 2026 | 89.90% | - | Yes | Source | |
| Muse Spark | 08 Apr 2026 | 89.50% | - | Yes | Source | |
| Grok 4 Heavy | 10 Jul 2025 | 88.90% | - | Yes | Source | |
| Seed 2.0 Pro | 14 Feb 2026 | 88.90% | Seed2 official benchmark table | GPQA Diamond | Yes | Source | |
| GPT 5.4 Mini | 17 Mar 2026 | 88% | - | Yes | Source | |
| o3 Preview | 20 Dec 2024 | 87.70% | - | Yes | Source | |
| Grok 4 | 10 Jul 2025 | 87.50% | No Tools | Yes | Source | |
| GPT 5 | 07 Aug 2025 | 87.30% | Pass @ 1 | Yes | Source | |
| Claude Opus 4.5 | 24 Nov 2025 | 86.95% | Avg@5, 64k Thinking | Yes | Source | |
| Gemini 3.1 Flash Lite Preview | 03 Mar 2026 | 86.90% | - | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-06-05) | 05 Jun 2025 | 86.40% | Single Attempt | Yes | Source | |
| GLM 5.1 | - | 86.20% | - | Yes | Source | |
| DeepSeek V3.2 Speciale | 01 Dec 2025 | 85.70% | - | Yes | Source | |
| Seed 2.0 Lite | 14 Feb 2026 | 85.10% | Seed2 official benchmark table | GPQA Diamond | Yes | Source | |
| Claude 3.7 Sonnet | 24 Feb 2025 | 84.80% | - | Yes | Source | |
| GLM 5 Turbo | 15 Mar 2026 | 84.70% | Artificial Analysis structured model metrics | No | Source | |
| Grok 3 Beta | 19 Feb 2025 | 84.60% | Think, Cons@64 | Yes | Source | |
| Gemma 4 31B | 02 Apr 2026 | 84.30% | - | Yes | Source | |
| Grok 3 Mini Beta | 19 Feb 2025 | 84% | Think, Cons@64 | Yes | Source | |
| o3 Pro | 10 Jun 2025 | 84% | - | Yes | Source | |
| Claude Sonnet 4 | 21 May 2025 | 83.80% | - | Yes | - | |
| Claude Opus 4 | 21 May 2025 | 83.30% | - | Yes | - | |
| o3 | 16 Apr 2025 | 83.30% | - | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-05-06) | 06 May 2025 | 83% | Pass@1 | Yes | Source | |
| GPT 5.4 Nano | 17 Mar 2026 | 82.80% | - | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-05-20) | 20 May 2025 | 82.80% | Pass@1 | Yes | Source | |
| Gemma 4 26B A4B | 02 Apr 2026 | 82.30% | - | Yes | Source | |
| GPT 5 Mini | 07 Aug 2025 | 82.30% | High Reasoning Effort, No Tools | Yes | Source | |
| Nova 2 Pro | 02 Dec 2025 | 81.40% | - | Yes | Source | |
| o4 Mini | 16 Apr 2025 | 81.40% | - | Yes | Source | |
| Qwen 3 235B A22B Thinking 2507 | - | 81.10% | - | Yes | Source | |
| Deepseek R1 (2025-05-28) | 28 May 2025 | 81% | - | Yes | Source | |
| Claude Opus 4.1 | 05 Aug 2025 | 80.90% | - | Yes | Source | |
| GPT OSS 120b | 05 Aug 2025 | 80.90% | High Reasoning Effort, With Tools | Yes | Source | |
| Grok 3 Mini | 18 Apr 2025 | 80.30% | High Reasoning Effort | Yes | Source | |
| o3 mini | 30 Jan 2025 | 79.70% | High Reasoning Effort | Yes | - | |
| Nova 2 Lite | 02 Dec 2025 | 79.60% | - | Yes | Source | |
| Grok 3 | 18 Apr 2025 | 79.10% | - | Yes | Source | |
| o1 pro | 19 Mar 2025 | 79% | - | Yes | Source | |
| Seed 2.0 Mini | 14 Feb 2026 | 79% | Seed2 official benchmark table | GPQA Diamond | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-04-17) | 17 Apr 2025 | 78.30% | Thinking, Single Attempt | Yes | Source | |
| Gemini 2.0 Flash | 05 Feb 2025 | 78.30% | Single Attempt | Yes | Source | |
| o1 | 17 Dec 2024 | 78% | - | Yes | Source | |
| Qwen 3 A235 A22B Instruct 2507 | - | 77.50% | - | Yes | Source | |
| Trinity Large Thinking | 01 Apr 2026 | 76.30% | Hugging Face model card benchmark table (arcee-ai/Trinity-Large-Thinking) | Yes | Source | |
| Llama 3.1 Nemotron Ultra 253B v1 | 07 Apr 2025 | 76% | - | Yes | Source | |
| EXAONE 4.0 32B | 15 Jul 2025 | 75.40% | Reasoning | Yes | Source | |
| Kimi K2 (2025-09-05) | 05 Sept 2025 | 75.10% | Avg@8 | Yes | Source | |
| GPT OSS 20b | 05 Aug 2025 | 74.20% | High Reasoning Effort, With Tools | Yes | Source | |
| o1 preview | 12 Sept 2024 | 73.30% | - | Yes | Source | |
| Solar Pro 3 (2026-01-26) | 26 Jan 2026 | 72.40% | Artificial Analysis structured model metrics | No | Source | |
| Deepseek R1 (2025-01-20) | 20 Jan 2025 | 71.50% | - | No | Source | |
| GPT 4.5 | 27 Feb 2025 | 71.40% | - | Yes | - | |
| Ministral 3.0 14B | 02 Dec 2025 | 71.20% | - | Yes | Source | |
| GPT 5 Nano | 07 Aug 2025 | 71.20% | High Reasoning Effort, No Tools | Yes | Source | |
| Magistral Medium 1.0 | 10 Jun 2025 | 70.80% | - | Yes | Source | |
| Phi 4 Reasoning Plus | 30 Apr 2025 | 68.90% | - | Yes | Source | |
| DeepSeek V3 (2025-03-24) | 25 Mar 2025 | 68.40% | - | Yes | Source | |
| Magistral Small 1.0 | 10 Jun 2025 | 68.20% | - | Yes | Source | |
| Claude 3.5 Sonnet (2024-06-20) | 21 Jun 2024 | 67.20% | - | Yes | Source | |
| Ministral 3.0 8B | 02 Dec 2025 | 66.80% | - | Yes | Source | |
| Gemini 2.5 Flash Lite Preview (2025-06-17) | 17 Jun 2025 | 66.70% | Thinking | Yes | Source | |
| Llama 3.3 Nemotron Super 49B v1 | 18 Mar 2025 | 66.70% | - | Yes | Source | |
| GPT 4.1 | 14 Apr 2025 | 66.30% | - | Yes | Source | |
| Qwen 3 30B A3B | - | 65.80% | - | Yes | Source | |
| Phi 4 Reasoning | 30 Apr 2025 | 65.80% | - | Yes | Source | |
| QwQ 32B | - | 65.20% | - | Yes | Source | |
| QwQ 32B Preview | - | 65.20% | - | Yes | Source | |
| Claude 3.5 Sonnet (2024-10-22) | 22 Oct 2024 | 65% | - | Yes | - | |
| GPT 4.1 Mini | 14 Apr 2025 | 65% | - | Yes | Source | |
| o1 mini | 12 Sept 2024 | 60% | - | Yes | Source | |
| DeepSeek V3 (2024-12-26) | 26 Dec 2024 | 59.10% | - | No | Source | |
| Nova Premier | 30 Apr 2025 | 57.10% | - | Yes | - | |
| Phi 4 | 12 Dec 2024 | 56.10% | - | Yes | Source | |
| Grok 2 | 13 Aug 2024 | 56% | - | Yes | Source | |
| Llama 3.1 Nemotron Nano 8B V1 | 18 Mar 2025 | 54.10% | - | Yes | Source | |
| Ministral 3.0 3B | 02 Dec 2025 | 53.40% | - | Yes | Source | |
| EXAONE 4.0 1.2B | 15 Jul 2025 | 52% | Reasoning | Yes | Source | |
| Phi 4 Mini Reasoning | 30 Apr 2025 | 52% | - | Yes | Source | |
| Grok 2 Mini | 13 Aug 2024 | 51% | - | Yes | Source | |
| Llama 3.1 405B Instruct | 23 Jul 2024 | 50.70% | - | Yes | Source | |
| Llama 3.3 70B Instruct | 06 Dec 2024 | 50.50% | - | Yes | Source | |
| Claude 3 Opus | 04 Mar 2024 | 50.40% | - | Yes | Source | |
| GPT 4.1 Nano | 14 Apr 2025 | 50.30% | - | Yes | Source | |
| Qwen 2.5 32B | - | 49.50% | - | Yes | Source | |
| Qwen 2.5 72B | - | 49% | - | Yes | Source | |
| Kimi K2 (2025-07-11) | 11 Jul 2025 | 48.10% | Avg@8 | Yes | Source | |
| Qwen 3 235B A22B | - | 47.50% | - | Yes | Source | |
| Nova Pro 1.0 | 04 Dec 2024 | 46.90% | - | Yes | Source | |
| Mistral Small 3.2 | 20 Jun 2025 | 46.13% | - | Yes | Source | |
| Mistral Small 3.1 | 17 Mar 2025 | 46% | - | Yes | Source | |
| Qwen 2.5 VL 32B Instruct | - | 46% | - | Yes | Source | |
| GPT 4o (2024-08-06) | 06 Aug 2024 | 46% | - | Yes | Source | |
| Qwen 2.5 14B | - | 45.50% | - | Yes | Source | |
| Mistral Small 3.0 | 30 Jan 2025 | 45.30% | - | Yes | Source | |
| Mistral Large 3.0 | 02 Dec 2025 | 43.90% | 5 Shot, No Reasoning | Yes | Source | |
| Qwen 2 72B Instruct | - | 42.40% | - | Yes | Source | |
| Gemma 3 27B | 12 Mar 2025 | 42.40% | - | Yes | Source | |
| Nova Lite 1.0 | 04 Dec 2024 | 42% | - | Yes | Source | |
| Llama 3.1 70B Instruct | 23 Jul 2024 | 41.70% | - | Yes | Source | |
| Claude 3.5 Haiku | 04 Nov 2024 | 41.60% | - | Yes | Source | |
| Gemma 3 12B | 12 Mar 2025 | 40.90% | - | Yes | Source | |
| Claude 3 Sonnet | 04 Mar 2024 | 40.40% | - | Yes | Source | |
| Gemini Diffusion | 20 May 2025 | 40.40% | Pass@1 | Yes | Source | |
| GPT 4o Mini (2024-07-18) | 18 Jul 2024 | 40.20% | - | Yes | Source | |
| Nova Micro 1.0 | 04 Dec 2024 | 40% | - | Yes | Source | |
| Jamba Large 1.6 | 06 Mar 2025 | 38.70% | - | No | Source | |
| Jamba Large 1.5 | 22 Aug 2024 | 36.90% | - | Yes | Source | |
| Phi 3.5 MoE instruct | 23 Aug 2024 | 36.80% | - | Yes | Source | |
| Qwen 2.5 7B | - | 36.40% | - | Yes | Source | |
| Grok 1.5 | 28 Mar 2024 | 35.90% | - | Yes | Source | |
| Gemini 1.0 Ultra | 06 Dec 2023 | 35.70% | - | Yes | Source | |
| GPT 4 (2023-03-14) | 14 Mar 2023 | 35.70% | - | Yes | Source | |
| Claude 3 Haiku | 13 Mar 2024 | 33.30% | - | Yes | Source | |
| Llama 3.2 3B Instruct | 25 Sept 2024 | 32.80% | - | Yes | Source | |
| Jamba Mini 1.5 | 22 Aug 2024 | 32.30% | - | Yes | Source | |
| Qwen 2.5 Omni 7B | - | 30.80% | - | Yes | Source | |
| GPT 3.5 Turbo 0613 | - | 30.80% | - | No | - | |
| Gemma 3 4B | 12 Mar 2025 | 30.80% | - | Yes | Source | |
| Llama 3.1 8B Instruct | 23 Jul 2024 | 30.40% | - | Yes | Source | |
| Phi 3.5 mini instruct | 23 Aug 2024 | 30.40% | - | Yes | Source | |
| Jamba Mini 1.6 | 06 Mar 2025 | 30% | - | No | Source | |
| Gemini 1.0 Pro | 06 Dec 2023 | 27.90% | - | No | - | |
| Qwen 2 7B Instruct | - | 25.30% | - | Yes | Source | |
| Gemma 3 1B | 12 Mar 2025 | 19.20% | - | Yes | Source |