Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Claude Opus 4.7 | 16 Apr 2026 | 86.30% | - | Yes | Source | |
| Claude Sonnet 5 | 30 Jun 2026 | 73.30% | Exact-match accuracy on Anthropic internal agentic harness; mean of five trials | Yes | Source | |
| GPT 5.4 | 05 Mar 2026 | 68.10% | - | Yes | Source | |
| GPT 5.5 | 23 Apr 2026 | 54.10% | Pro | Yes | Source |