One of the things they point out is that the SoTA on e.g. LibriSpeech is *only* ...

lunixbochs · on Sept 21, 2022

My own experience agrees: the generally available "SOTA" models are not especially robust, and can be _extremely_ bad (>50% absolute error rate) at some tasks. I'll post some preliminary numbers in a sibling comment and look into running my full set of tests on Whisper.

It looks like Whisper is probably leaving a lot of accuracy on the table, but initially it does seem to be a lot more robust than general "SOTA" models.

For a quick comparison, Silero's accuracy charts are kind of nice because they post results for a large variety of datasets. Scroll down to the EN V6 xlarge EE model (not the xlarge CE) [1]

[1] https://github.com/snakers4/silero-models/wiki/Quality-Bench...