April 28, 2026: Unsloth Minimax-m2.7 added. It has a great score but one of the slowest models.April 27, 2026: Until March 2026 Open AI OSS 120b was a clear leader, even though it was launched all the way back in May of 2025. Google Gemma 4 now has now taken the lead followed by Qwen 3.6. Mistral 4 small trails even last years models.
The models from 2026 have thinking disabled. Look at swedish_benchmark_fast for benchmarks with thinking enabled.
Overall score is benchmark-normalized, so each benchmark compares models against the others on that task before averaging. Missing coverage reduces the final score. For classification tasks the primary score is accuracy; for SweParaphrase it is Pearson correlation.
This view averages each model across the benchmarks used by the leaderboard. Left and up is better.
Each scatter plot shows one benchmark. The x-axis is average latency in seconds and the y-axis is that benchmark's primary score.
Compact per-benchmark summaries for the currently selected models.
Short descriptions for the benchmarks shown above.