Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Doesn't even need to be user guided. Use videos that have audio. You could have one AI that generates a transcript using the audio/video and another that watches the video on mute and tries to read the lips. Feedback would then be provided by the AI that had access to the audio.


I am thinking of the millions of hours of tv news. Presenters are almost always going to be the same position in frame and may already have high quality transcripts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: