Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How does this compare to something like Whisper?

EDIT: this is a genuine question as I don't have a clue. Rather than downvoting without comment, maybe downvote and let me know why my question is dumb?



It's translation (text -> text), not speech -> text.


Thanks, much appreciated for the clarification. I clearly overlooked that, which now it's pointed out seems entirely obvious, my bad. Only took negative karma for it to click, haha.


Ironically, the other link I posted at the same is actually speech to text. You want something like VOSK if you're looking for local machine transcription:

https://news.ycombinator.com/item?id=40027675

As for quality, I think its models are, IDK, maybe around the level that Youtube automatic captions were two or three years ago? So well over 90% accurate, and servicable for getting something to search for or clean up, but expect it to get a word wrong every now and then.


This post got downvoted, but there's a legit point here. I've found whisper's translated speech to text to be pretty decent, certainly compared to the reported quality of this bergamot-tiny used in the OP.

FWIW, I like Helinski opus on Huggingface, worth checking out if you need machine translation and can deal with sub Google Translate quality.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: