Show HN: TTS-API - Text-to-speech API

yoda_sl · on Nov 11, 2012

Quite cool... But if this is generated by the text to speech engine from OS X, then I am afraid it is going beyond the license that come up with OS X. I remember reading through that license and it was clearly stated that using the OS X TTS was only for local usage on your Mac.

So I am extremely curious to know the license behind this tts-api? Can the OP provide such info or provide some of the tech behind it?

mherdeg · on Nov 11, 2012

In case anyone else is curious, the section you're thinking of is

""F. Voices. Subject to the terms and conditions of this License, you may use the system voices included in the Apple Software (“System Voices”) (i) while running the Apple Software and (ii) to create your own original content and projects for your personal, non-commercial use. No other use of the System Voices is permitted by this License, including but not limited to the use, reproduction, display, performance, recording, publishing or redistribution of any of the System Voices in a profit, non-profit, public sharing or commercial context.""

yoda_sl · on Nov 11, 2012

Thank you! Exactly what I was referring to.

bencevans · on Nov 11, 2012

There's an unofficial Google API that does the same job people may be interested in: http://translate.google.com/translate_tts?tl=en&q=Hello+...

duiker101 · on Nov 11, 2012

limited to 150 characters IIRC

kajecounterhack · on Nov 11, 2012

Bing's got an API -- you need to sign up for a key but you get a very large number of free uses and I dont think there's a char limit.

riobard · on Nov 11, 2012

For those who want to run their own copy of this, here's how to do it:

1. Find a Mac-based server (a co-located Mac Mini will be fine)

2. Run `say -o output.wav $TEXT` to generate the voice

3. Compress the WAVE file with `lame` or the system builtin `afconvert` to get the MP3 file.

`say` command supports multiple languages and dialects, but you'll have to install the necessary voice engines in OS X 10.8. Man page for `say` can be found here http://pastebin.com/nWbvJAAX

The complete list of voices/languages supported so far:

* English (Australia): 2 voices

* English (India): 1 voice

* English (Ireland): 1 voice

* English (Scottish): 1 voice

* English (South Africa): 1 voice

* English (UK): 3 voices

* English (US - Female): 7 voices

* English (US - Male): 6 voices

* English (US - Novelty): 14 voices

* Arabic (Saudi Arabia): 1 voice

* Chinese (China): 1 voice

* Chinese (HK): 1 voice

* Chinese (Taiwan): 1 voice

* Czech: 1 voice

* Danish: 1 voice

* Dutch (Belgium): 1 voice

* Dutch (Netherlands): 2 voices

* Finnish: 1 voice

* French (Canada): 2 voices

* French (France): 4 voices

* German (Germany): 3 voices

* Greek: 2 voices

* Hindi: 1 voice

* Hungarian: 1 voice

* Indonesian: 1 voice

* Italian: 3 voices

* Japanese: 1 voice

* Korean: 2 voices

* Norwegian Bokmal: 1 voice

* Polish: 1 voice

* Portuguese (Brazil): 1 voice

* Portuguese (Portugal): 1 voice

* Romanian: 1 voice

* Russia: 1 voice

* Slovak: 1 voice

* Spanish (Mexico): 2 voices

* Spanish (Spain): 2 voices

* Swedish: 2 voices

* Thai: 1 voice

* Turkish: 1 voice

snoonan · on Nov 11, 2012

Careful, though. This is expressly against the license agreement for Mac OS X.

jiggy2011 · on Nov 11, 2012

If you use Ubuntu or similar there is also eSpeak

espeak -w output.wav 'I love jiggy'

est · on Nov 12, 2012

or on Windows

new file, paste

    createobject("sapi.spvoice").speak("hi world")

save as 1.vbs, then double click it.

microtherion · on Nov 12, 2012

I'm pretty sure if you have lame installed, say can output directly to mp3.

Toshio · on Nov 11, 2012

The open-source, cross-platform equivalent of `say` is a piece of software called "SVOX Pico". There is also a Python-based wrapper for it called picospeaker. Relevant AUR link for ArchLinux users:

https://aur.archlinux.org/packages/picospeaker/

EDIT: SVOX Pico is a component of the Android OS.

abava · on Nov 11, 2012

Russian does not work: http://tts-api.com/tts.mp3?q=%D0%BF%D1%80%D0%B8%D0%B2%D0%B5%...

lefthansolo · on Nov 11, 2012

This is a nice one, however I'm still confounded by the lack of progress since bell labs made an online text to speech converter many years ago. Particularly, the notion that the interpretation of each sentence is idempotent is just wrong. Want to see what I mean? A human would not speak like the following; there should be differences in intonation, "emotion" (sounding bored, angry, excited, etc. that varies depending on the number of times "dogs" would be said), speed, and delay. In addition, you have to breathe at some point, and even the best audiobooks have some level of breath noise.

http://tts-api.com/tts.mp3?q=dogs.%20dogs.%20dogs.%20dogs.%2....

username3 · on Nov 12, 2012

It takes a breath for blank lines or new paragraphs. http://tts-api.com/tts.mp3?q=High%20Quality%0AWe%20believe%2...!

atombender · on Nov 11, 2012

This is a bit off topic, but a related question: I have been looking for a "bad" text to speech library that produces Stephen Hawking-style audio, similar to what's found in old 1970/80s electronics. Examples:

http://www.youtube.com/watch?v=gh0fBwiE4cE

http://www.youtube.com/watch?v=vvYvCaAN3Jg

Anyone?

kellishaver · on Nov 12, 2012

On OS X, the "Fred" voice is pretty close.

say -v Fred Hello. My name is Fred.

atombender · on Nov 12, 2012

Thanks, but I need a library I can use in an app (and not just on OS X).

snoonan · on Nov 11, 2012

Excellent and dead easy to use. Great work on making it simple.

I was actually looking for a similar API like this just a few hours ago, but with some other languages as well. What's the TTS engine driving this?

BTW, One small critique on the page copy... "You expect" could be more politely expressed and in terms of the user's pov/benefit.

stcredzero · on Nov 11, 2012

> Excellent and dead easy to use. Great work on making it simple.

The acronym should reflect this ease of use for proper pronunciation. How about Text Intelligently To Speech?

franze · on Nov 11, 2012

just wanted to add: you can now do this all in the browser (100% client side), too -> http://lalo.li/ (it's forkable)

archangel_one · on Nov 11, 2012

Well, not really; the quality is nothing like as good. It's a nifty trick being able to do synthesis at all in Javascript, though.

And, of course, it presumably doesn't have the licensing issues this other approach would appear to, if it really is using Apple's voices.

alexmunroe · on Nov 11, 2012

Pretty impressive, I've given it a go with a few of the more technical terms that I come across at work and that other TTS' have difficulty handling and it dictated them flawlessly. Very interested to see where this goes!

po84 · on Nov 11, 2012

http://syntensity.com/static/espeak.html

If you're looking for a client side solution, here's espeak compiled to JS using emscripten.

ninjin · on Nov 11, 2012

Neat, once again, emscripten proves useful. I do find it important though to point out the lack of a good open text-to-speech engine.

Here is a speech as rendered by tts-api.com (http://goo.gl/PoZc4). Now, for speak.js [1], to make a comparison, paste in the first few of the top paragraphs from here [2] and compare the quality between the two.

There really is a gap to fill for a good open-source alternative here. But I suspect the main barrier is that there is a large amount of data needed to generate good voices. Still, a worthy target.

[1]: I tried to make a URL for this too, but despite the URL looking as if it could take arguments it refused to work, at least for me under Firefox and Chrome.

[2]: http://www.nytimes.com/2008/09/25/business/worldbusiness/25i...

tantalor · on Nov 11, 2012

Does it support IPA or SSML[1]? I ask because AT&T's TTS API[2] does, but it kind of sucks!

For example,

  <phoneme alphabet="ipa" ph="/ˈkreɪp/"></phoneme>

[1] http://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Languag...

[2] http://www2.research.att.com/~ttsweb/tts/demo.php

ChrisKelly · on Nov 11, 2012

You might be able to do non-English pronunciations by trying phonetic spellings, which can be tricky. The best I could get for "felicidades" was this: fell isseedadesh.

microtherion · on Nov 12, 2012

Yes, it supports the subset of IPA that more or less maps to English phonemes (Getting a voice talent to produce clicks would be a neat exercise).

dholowiski · on Nov 11, 2012

Where are the voices from, and how are they licensed? Could I use the output from this for commercial purposes?

elbuo8 · on Nov 12, 2012

Nodejs module available at: https://npmjs.org/package/node-tts-api https://github.com/elbuo8/node-tts-api

Enjoy

dave84 · on Nov 11, 2012

Sounds like Alex from OSX.

babebridou · on Nov 11, 2012

Great, simply great API that just works. Keep up the good work!

What I would love to see now would be the ability to send compressed text to shorten the url.

chrisallick · on Nov 12, 2012

http://clubsexytime.com/projects/tweader/

twitter tracker using tts :)

thanks, such an amazing service

jeffehobbs · on Nov 11, 2012

Is this piping to the Mac OS X "say" CLI command? Neat. I'd love to see the source behind this, if you felt like putting it on Github.

d0vs · on Nov 11, 2012

Wow, very impressed!

Try "Hello.", "Hello!" and "Hello?"

sinzone · on Nov 11, 2012

Hi, would like to have this API on Mashape.com ... I think our community of developers will like it.

KwanEsq · on Nov 11, 2012

Very nice. Would be good to have the option of other formats, specifically Vorbis and/or Opus.

savrajsingh · on Nov 11, 2012

Awesome! Will you release a speech-to-text API as well, or know of a good one? Thanks!

taf2 · on Nov 11, 2012

i've had some success with the web api provided by att: http://developer.att.com/developer/forward.jsp?passedItemId=... it's in beta i believe.

yati · on Nov 11, 2012

Great job! Something that was really needed. I would love to see this open sourced :)

morgangiraud · on Nov 11, 2012

Very smooth and simple as it should be. Are you going to implement other languages ?

leoplct · on Nov 11, 2012

Would be great if you could share your code on Github

bussetta · on Nov 11, 2012

App idea! Now listen to your tweets from timeline!

chrisallick · on Nov 12, 2012

http://clubsexytime.com/projects/tweader/ done

shritesh · on Nov 11, 2012

Thank you :) Really needed something like this!

codegeek · on Nov 11, 2012

pretty good. A nice to have will be to let the audio play in browser as well instead of just having a link.

davecap1 · on Nov 11, 2012

What about a speech-to-text API?

kajecounterhack · on Nov 11, 2012

Speech to text is a far more computationally difficult problem. Google has an unofficial one -- you can curl flac voice files to them but even their transcription is not terrific. (They use it for automatic captions on youtube -- use that to judge...)