Hacker Newsnew | past | comments | ask | show | jobs | submit | petrochukm's commentslogin

WellSaid Labs | Seattle | ONSITE | https://wellsaidlabs.com/

WellSaid Labs uses deep generative models to create hyperrealistic voice-overs for high-quality media content like books (i.e. audiobooks), videos, assistive devices, call centers, video games, resurrected celebrities, etc. The voice-over market alone is $5 billion dollars.

We have also secured substantial seed funding from top-tier VCs and are building out our funding team. Finally, we spun we are a spin-out from Allen Institute of Artificial Intelligence (a.k.a. Paul Allen's AI Lab).

You'll work in one of these roles:

- Full stack engineer (React / Node.js / GCP)

- Deep learning engineer/researcher (PyTorch / Python)

- Deep learning performance engineer (C++)

With WellSaid Labs, you'll help build one of the first commercial core deep learning products.

Email michael[at]wellsaidlabs[dot]com to apply.

----------------------------------

PRESS:

https://techcrunch.com/2019/03/07/wellsaid-aims-to-make-natu....

https://www.geekwire.com/2019/ai2s-incubator-gives-birth-wel....


Doubt it.

Generative-adversarial models have had a lot of success in image generation; however, the same cannot be said for speech synthesis.

Unless they have figured out a new technique, they are probably using Tacotron 2 (https://ai.googleblog.com/2017/12/tacotron-2-generating-huma...). Google's Tacotron 2 already achieved human-parity TTS without adversarial training as measured by MOS.


For context, it's important to know that these are probably cherry picked samples. The authors make no mention of attempting randomly select these samples. For as long as text-to-speech has existed, there have been impressive demos backed by cherry picking.

The 3 Dessa team members did not in 3 months of work create anything innovative probably. Rayhane Mamah, one of the Dessa team members, had previously published a Tacotron 2 (Google's 2017 research) implementation (https://github.com/Rayhane-mamah/Tacotron-2) that has similar noise/distortion and intonation/prosody issues as their "RealTalk model".

Following on the above, Google's TTS research already demonstrated human-parity as measured by MOS score in early 2018. That research was deployed as Google Duplex in mid 2018.

Google's TTS research also showed the deficiencies of this technology. Without the invention of AGI, the TTS models do not understand the underlying text; therefore, it'll be unable to do more "complex things with intonation/prosody". Furthermore, the models suffer from overfitting. The model performance degrades significantly when performing TTS on text not typically seen in the training data.


Hey!

> I honestly wonder if we are a year or two away from this being possible.

We've launched a service for adding voiceovers 2 months ago... https://wellsaidlabs.com/

Techcrunch: https://techcrunch.com/2019/03/07/wellsaid-aims-to-make-natu...

GeekWire: https://www.geekwire.com/2019/ai2s-incubator-gives-birth-wel...

> How can I do this with my voice?

To prevent abuse of our technology, we need to review your use-case before creating you a custom voice.

> The google WaveNet stuff is pretty good but still not there yet [1].

Google WaveNet is not built for high-quality voice-overs but rather for cheap and fast text-to-speech.


Hey...

By the way, we have a product launched specifically targetted towards that use case: https://wellsaidlabs.com/

Just sayin'

We're hiring! michael[at]wellsaidlabs[dot]com


WellSaid Labs | Seattle | ONSITE | https://wellsaidlabs.com/

WellSaid Labs uses deep generative models to create hyperrealistic voice-overs for books (i.e. audiobooks), videos, assistive devices, call centers, video games, etc.

Our launch:

https://techcrunch.com/2019/03/07/wellsaid-aims-to-make-natu...

https://www.geekwire.com/2019/ai2s-incubator-gives-birth-wel...

We have also secured substantial seed funding from top-tier VCs and are building out our funding team. We are a spin-out from Allen Institute of Artificial Intelligence (a.k.a. Paul Allen's AI Lab)

You'll work in one of these roles:

- Full stack engineer

- Infrastructure engineer

- Deep learning engineer / researcher

- Deep learning performance engineer

You'll pioneer in the first commercial deep generative model editor.

Email michael[at]wellsaidlabs[dot]com


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: