AI Language Bench Dashboard

Swedish Benchmarks V1.1

May 21, 2026: Added MTP, Multi-Token Prediction, of Qwen3.6 21B. Ran with LM-Studio 0.14.3. MTP does not seem to improve score or latency. Thinking is however disabled for these tests.May 1, 2026: Added Nvidia Nemotron 3 Nano Omni.April 29, 2026: Added three knowledge base benchmarks: Swedish culture, economy, and society.April 28, 2026: Unsloth Minimax-m2.7 added.March 2026: Until March 2026 Open AI OSS 120b was a clear leader, even though it was launched all the way back in May of 2025. Google Gemma 4 now has now taken the lead followed by Qwen 3.6. Mistral 4 small trails even last years models.

Generated: 2026-05-21 15:04:50 Leaderboard uses 75% task score and 25% latency.

Hardware Used

System

GMKTec EVO-X2

Processor

AMD Ryzen AI Max+ 395

Memory Split

32 GB RAM / 96 GB VRAM

Storage

Samsung 990 EVO Plus

Software Used

Operating System

Ubuntu 24.04

Runtime

LM Studio 0.4.12

GPU Stack

ROCm 2.13

Temperature

0.7

Context Length

30000

The models from 2026 have thinking disabled. Look at swedish_benchmark_fast for benchmarks with thinking enabled.

Leaderboard

Overall score and per-benchmark task score are benchmark-normalized combinations of raw task quality and latency using the slider weights. Missing coverage reduces the final score. Accuracy is shown separately as the raw correctness rate.

Leaderboard Weight

Task score: Latency:

Benchmarks Used

Overall Trade-Off

This view averages each model across the benchmarks used by the leaderboard. Left and up is better.

Benchmark Charts

Each scatter plot shows one benchmark. The x-axis is average latency in seconds and the y-axis is the latency-adjusted task score.

Benchmark Tables

Compact per-benchmark summaries for the currently selected models.

Benchmark Notes

Short descriptions for the benchmarks shown above.