> Tortoise is a bit tongue in cheek: this model is insanely slow. It leverages both an autoregressive decoder and a diffusion decoder; both known for their low sampling rates. On a NVidia Tesla K80, expect to generate a medium sized sentence every 2 minutes.
I suspect that for a real(-ish) time TTS system, something else is needed. OTOH if you want to record some voice acting for a game or other multimedia product, it still may be more cost-effective than recording a bunch of live humans.
(K8 = NVidia Tesla K80, GPU, $800-900 for a 24GB version right now.)
Would it still require a 3080 to run adequately, that is, with 1-2 seconds of delay? I've no idea what consumer-grade hardware works well for ML loads.
Kepler, Maxwell, Turing, Volta, ampere, Lovelace, hopper. it's 6 generations old when you include the micro architectures. it would be about a 10x improvement.
> Tortoise is a bit tongue in cheek: this model is insanely slow. It leverages both an autoregressive decoder and a diffusion decoder; both known for their low sampling rates. On a NVidia Tesla K80, expect to generate a medium sized sentence every 2 minutes.
I suspect that for a real(-ish) time TTS system, something else is needed. OTOH if you want to record some voice acting for a game or other multimedia product, it still may be more cost-effective than recording a bunch of live humans.
(K8 = NVidia Tesla K80, GPU, $800-900 for a 24GB version right now.)