Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if you could use the same technique (RAM models as ROM) for something like Whisper Speech-to-text, where the models are much smaller (around a Gigabyte) for a super-efficient single-chip speech recognition solution with tons of context knowledge.


Right now I have to wait 10 minutes at a time for the 2+ hour long transcriptions I've uploaded to Voxstral to process. The speed up here could be immense and worthwhile to so many customers of these products.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: