Search...
Ctrl K
Models
Providers
Apps
Rankings
Playground
Models
Providers
Apps
Rankings
Playground
Search...
Ctrl K
Sign In
Sign In
MBPP EvalPlus - Benchmark Leaderboard & Model Performance | AI Stats
MBPP EvalPlus
Overview
Overview
Type: numerical
General
Recorded Results
1
Average Score
0.88
Score Range
0.88 - 0.88
Leading Model
0.88 - Llama 2 70B Chat
Scores Over Time
Individual benchmark scores plotted by date.
Models Using This Benchmark
Organisation
Model
Reported
Top Score
Info
Self Reported
Source
Meta
Llama 2 70B Chat
20 Jun 2023
0.88
inferred family alias from llama-3.3-70b-instruct (score=0.3129; benches=9)
Yes
Source