AI Language Bench Dashboard

aggregate fast

Generated: 2026-04-28 13:20:43 Leaderboard uses 75% task score and 25% latency.

Hardware Used

System

GMKTec EVO-X2

Processor

AMD Ryzen AI Max+ 395

Memory Split

32 GB RAM / 96 GB VRAM

Storage

Samsung 990 EVO Plus

Software Used

Operating System

Ubuntu 24.04

Runtime

LM Studio 0.4.12

GPU Stack

ROCm 2.13

Temperature

0.7

Context Length

30000

The models from 2026 have thinking enabled in this dashboard.

Leaderboard

Overall score is benchmark-normalized, so each benchmark compares models against the others on that task before averaging. Missing coverage reduces the final score. For classification tasks the primary score is accuracy; for SweParaphrase it is Pearson correlation.

Leaderboard Weight

Task score: Latency:

Benchmarks Used

Overall Trade-Off

This view averages each model across the benchmarks used by the leaderboard. Left and up is better.

Benchmark Charts

Each scatter plot shows one benchmark. The x-axis is average latency in seconds and the y-axis is that benchmark's primary score.

Benchmark Tables

Compact per-benchmark summaries for the currently selected models.

Benchmark Notes

Short descriptions for the benchmarks shown above.