👋 The legacy platform at legacy.scribie.com is being retired on April 30, 2026. Reach out to support for queries.
Updated Apr 8, 2026

Transcription Accuracy Leaderboard: ASR & AI Proofreading on Real Audio

We benchmark leading ASR engines and audio proofreading models on real-world conversational and legal transcription data. Scored by N-WER (word accuracy) and cpWER (word accuracy + speaker attribution).

Leaderboard — N-WER ↓

ModelAvg N-WER ↓RTFx ↑LicenseGolden (8)Bench (7)E2wK
google/gemini-3-flash-preview🥇 Best6.63%~1xProprietary7.37%6.825.71
google/gemma-4-E4B-it10.43%6xOpen10.49%11.119.48
assemblyai/universal (raw)11.44%NAProprietary11.44%11.70—
CohereLabs/cohere-transcribe-03-202619.59%35xOpen19.59%——
XiaomiMiMo/MiMo-Audio-7B-Instruct42.43%10xOpen42.43%——

Leaderboard — cpWER ↓

cpWER measures word accuracy + speaker attribution jointly. Lower is better. Cohere excluded (no diarization).

ModelAvg cpWER ↓LicenseGolden (8)Bench (7)E2wK
google/gemma-4-E4B-it🥇 Best12.54%Open12.5414.9153.55
assemblyai/universal (raw)12.95%Proprietary12.95——
google/gemini-3-flash-preview16.63%Proprietary16.6314.97~29.0
XiaomiMiMo/MiMo-Audio-7B-Instruct33.74%Open33.74——

Per-File Results — Golden Set N-WER

ModelfPTnde21rx7SI0TbTHnjyeF05qLaag8PAVG
google/gemini-3-flash-preview5.484.376.557.347.5010.318.988.427.37
google/gemma-4-E4B-it6.765.356.7613.7013.088.4615.7714.0610.49
assemblyai/universal (raw)6.576.597.5315.3715.917.6617.0414.8711.44
CohereLabs/cohere-transcribe12.1010.5312.9215.2615.8717.1318.1654.7419.59
XiaomiMiMo/MiMo-Audio-7B39.656.4475.1850.3136.5525.2057.6248.5242.43

Datasets

The Golden Set is our primary, hardest benchmark — legal depositions, medical interviews, and files with up to 6 speakers. This is the set we use for final model evaluation.

The Benchmark Test Set is broader and more representative of typical volume — mostly general conversational content with 2–3 speakers.

The E2wK Stress Test is a single 4-hour board meeting with 28 speakers — testing extreme multi-speaker handling.

All sets are strictly held out from training, tuning, and prompt development. The holdout is enforced programmatically.

Golden Set — 8 files, ~8.5 hours

IDDurationSpkDomainContent
fPTnbIxCoXrz44 min3LegalDeposition — attorney, witness, court reporter
de217fa950c332 min2HealthcarePediatric healthcare needs interview
rx7SAmlqi2ZB35 min3GeneralMulti-party conversation
I0Tb5h9VWliu44 min2GeneralTwo-speaker interview
THnjjF5Sy991172 min6LegalLong deposition — multiple attorneys, rapid Q&A
yeF0szcOajip17 min4GeneralShort multi-speaker discussion
5qLa4TMcApPj57 min2GeneralExtended two-party conversation
ag8PupPUuwvj115 min3GeneralLong recording, similar-sounding speakers

Model Cards

ModelTypeSizeAudio Limit
google/gemini-3-flash-previewProofreaderUndisclosedLong
google/gemma-4-E4B-itProofreader8B (4.5B eff)30 sec
assemblyai/universalASRUndisclosedLong
CohereLabs/cohere-transcribeASR2B35 sec
XiaomiMiMo/MiMo-Audio-7BProofreader8.2B~3 min

ASR

Audio → text. Pure transcription, no correction against existing transcript.

Proofreader

Audio + ASR transcript → corrected transcript. Listens to audio and fixes ASR errors.

Metrics

MetricWhat it measuresLower is better
N-WERWord accuracy (normalized — lowercase, strip punctuation)✅
cpWERWord accuracy + speaker attribution (optimal permutation)✅
RTFxProcessing speed (audio minutes / wall clock minutes)Higher = faster

Ready for accurate, AI-assisted transcription?

Our pipeline combines the best-performing ASR and proofreader models to deliver industry-leading accuracy at scale.

Maintained by the Superproofer team at Scribie. All evaluation files are held out and never used for training.