Search...
Ctrl K
Models
Providers
Rankings
Chat
Models
Providers
Rankings
Chat
Search...
Ctrl K
Sign In
Sign In
OSWorld - Benchmark Leaderboard & Model Performance | AI Stats
OSWorld
Overview
Overview
View benchmark source
Recorded Results
1
Average Score
0.66
Score Range
0.66 - 0.66
Leading Model
0.66 - Claude Opus 4.5
Scores Over Time
Individual benchmark scores plotted by date.
Models Using This Benchmark
Organisation
Model
Reported
Top Score
Info
Self Reported
Source
Anthropic
Claude Opus 4.5
24 Nov 2025
0.66
Pass@1; Avg@5, 64k Thinking
Yes
Source