Doesn't even need to be user guided. Use videos that have audio. You could have ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		indoordin0saur on Sept 10, 2024 \| parent \| context \| favorite \| on: Lip Reading as a Service (Read Their Lips by Symph... Doesn't even need to be user guided. Use videos that have audio. You could have one AI that generates a transcript using the audio/video and another that watches the video on mute and tries to read the lips. Feedback would then be provided by the AI that had access to the audio.

0cf8612b2e1e on Sept 10, 2024 [–]

I am thinking of the millions of hours of tv news. Presenters are almost always going to be the same position in frame and may already have high quality transcripts.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact