Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The speed of improvement of tts models reminds me of early days of Stable Diffusion. Can't wait until I can generate audiobooks without infinite pain. If I was an investor I'd short Audible.




An all-TTS audiobook offering is just about as appealing as an all-stable-diffusion picture gallery (that is, not at all).

There already are audiobooks on audible that are 100% TTS, while it's playable, it's no substitute (yet) for a real human.

It's just too flat/dead compared to a human reader.


Isn’t it more like an art gallery of prints of paintings? The primary art is the text of the book (like the painting in the gallery), TTS (and printing a copy) are just methods of making the art available.

I think it can be argued that audiobook's add to the art by adding tone and inflection by the reader.

To me, what you're saying is the same as saying the art of a movie is in the script, the video is just the method of making it available. And I don't think that's a valid take


No, that's an incorrect analogy. The script of a movie is an intermediate step in the production process of a movie. It's generally not meant to be seen by any audiences. The script for example doesn't contain any cinematography or any soundtrack or any performances by actors. Meanwhile, a written work is a complete expressive work ready for consumption. It doesn't contain a voice, but that's because the intention is for the reader to interpret the voice into it. A voice actor can do that, but that's just an interpretation of the work. It's not one-to-one, but it's not unlike someone sitting next to you in the theater and telling you what they think a scene means.

So yes, I mostly agree with GP. An audiobook is a different rendering of the same subject. The content is in the text, regardless of whether it's delivered in written or oral form.


I've moved to https://github.com/readest/readest over audio books in most cases. I just need the dang thing in my ears and their TTS is good enough.

It's not perfect, but I already have a setup for doing this on my phone. Add SherpaTTS and Librera Reader to your phone. (both available free on fdroid).

Set up SherpaTTS as the voice model for your phone (I like the en_GB-jenny_dioco-medium voice option, but there are several to choose from). Add a ebook to librera reader and open it. There's an icon with a little person wearing headphones, which lets you send the text continuously to your phone's tts, using just local processing on the phone. I don't have the latest phone but mine is able to process it faster than the audio is read, so the audio doesn't stop and start.

The voice isn't totally human sounding, but it's a lot better than the microsoft sam days, and once you get used to it the roboticness fades into the background and I can just listen to the story. You may get better results with kokoro (I couldn't get it running on my phone) or similar tts engines and a more powerful phone.

One thing I like about this setup is that if you want to swap back and forth between audio and text, you can. The reader scrolls automatically as it makes the audio, and you can pause it, read in silence for a while yourself and later set it going from a new point.


I feel like TTS is one of the areas that as evolved the least. Small TTS models have been around for like 5+ years and they've only gotten incrementally better. Giants like ElevenLabs make good sounding TTS but it's not quite human yet and the improvements get less and less each iteration.

Wouldn't audible be perfectly positioned to take advantage of this. They have the perfect setup to integrate this into their offering.

It seems more likely that people will buy a digital copy of the book for a few bucks and then run the TTS themselves on devices they already own.

Not likely at all, people pay for convenience. They don't want to do that

eBooks are much more expensive then an Audible subscription though.

I wouldn't say so. Audible gives you 1 book a month for $15. Most e-books I see are around $10.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: