EDIT: this is a genuine question as I don't have a clue. Rather than downvoting without comment, maybe downvote and let me know why my question is dumb?
Thanks, much appreciated for the clarification. I clearly overlooked that, which now it's pointed out seems entirely obvious, my bad. Only took negative karma for it to click, haha.
Ironically, the other link I posted at the same is actually speech to text. You want something like VOSK if you're looking for local machine transcription:
As for quality, I think its models are, IDK, maybe around the level that Youtube automatic captions were two or three years ago? So well over 90% accurate, and servicable for getting something to search for or clean up, but expect it to get a word wrong every now and then.
This post got downvoted, but there's a legit point here. I've found whisper's translated speech to text to be pretty decent, certainly compared to the reported quality of this bergamot-tiny used in the OP.
FWIW, I like Helinski opus on Huggingface, worth checking out if you need machine translation and can deal with sub Google Translate quality.
EDIT: this is a genuine question as I don't have a clue. Rather than downvoting without comment, maybe downvote and let me know why my question is dumb?