Major improvements to audio at Zencoder, and why this matters

DarkShikari · on July 20, 2011

General nitpick for articles like this: if you improved X about your service by switching from application A to application B, simply say it. Phrasing like that used in this article implies (falsely) that Zencoder wrote their own encoder.

"We made our web server 5 times faster!" might sound amazing, but if you really mean "we switched from Apache to Nginx", it's misleading.

Also, FAAC is widely known to be pretty bad. Even LAME MP3 is quite a bit better at 128kbps, to say nothing of Vorbis, good AAC encoders like Nero or Apple, and Opus/CELT.

jon_dahl · on July 20, 2011

Yep, we didn't write our own encoder - we evaluated and licensed a third party encoder. Article edited to make this more clear. The problem (at this point at least) is that our software licensing agreement with the third party includes confidentiality, so we can't name the software without permission. Stupid, I know. But you can probably guess, or figure out, what it is. ;)

wmf · on July 20, 2011

I've seen confidentiality the other way (where a vendor can't say who their customers are) but why would someone not want a customer to tell the world how great their software is?

sanswork · on July 21, 2011

Only reason I can think is that if the customer messes up the implementation and creates a subpar service with it then it reflects poorly on the source provider.

Actually another reason might be that the source provider doesn't want other users to know about each other to compare licensing agreements and such.

nfriedly · on July 21, 2011

I'd guess it was written by a lawyer who was more concerned about a customer telling the world how great their software isn't. And a blanket "do not mention" statement is probably safer than a "do not mention negatively".

mikeryan · on July 21, 2011

I get what you're saying, but Zencoder's product isn't an encoder, its a cloud based encoding service (we're happy customers) as a consumer the actual implementation is a black box, as an engineer I appreciate the process they went through and also appreciate that they explained the way they got there.

If Heroku switched from Apache to Nginx and saw a marked improvement in the performance I'm find with them saying "We're serving pages 5x faster".

aidenn0 · on July 20, 2011

I thought it was a well-known fact that FAAC produced very bad output. Or at least well known by anyone who might be working for a company for whom AV compression is all they do.

jon_dahl · on July 20, 2011

I'd say this is well known with anyone working with open source video tools. And yet it's still in use all over the place, sadly.

steveh73 · on July 20, 2011

So after all that, what encoder are you using?

leoh · on July 20, 2011

Which aac encoder does iTunes use? Is it considered pretty good?

jon_dahl · on July 20, 2011

Yes - iTunes uses Quicktime, and Quicktime has one of the better AAC encoders. Unfortunately, it's not easily executable (or licensable) on Linux.

bgentry · on July 20, 2011

Can you comment on which actual encoder you're now using, or do you consider that a secret?

jon_dahl · on July 21, 2011

Yep, unfortunately we can't right now. Maybe down the road?

jodrellblank · on July 21, 2011

Looks like a secret: http://news.ycombinator.com/item?id=2787570 (a comment elsewhere on this page).

DarkShikari · on July 20, 2011

They use their own encoder. Apple's encoder is considered one of the best (slightly better than Nero).

kierank · on July 20, 2011

Apple licensed an encoder. I forget the company that made it - it's mentioned in the dll somewhere.

nitrogen · on July 21, 2011

I found the ringing in the new 16kHz sample to be a bit bothersome. It sounds like theres a tonal peak in the frequency response near the upper end of the frequency range. The peak would be less bothersome if it were spread out more.

Edit: is the new resampling done by something like Shibatch SSRC or libsamplerate's SRC?

jon_dahl · on July 21, 2011

It sounds a bit better outside of Flash, actually. Flash prefers 22050 or 44100 audio and doesn't do a great job of resampling others up to 44100 (which I think is what happens behind the scenes). Give it a listen in Chrome or some such and see what you think.

We aren't using those libraries right now, but we'll check them out. How do they compare to something like SoX?

nitrogen · on July 22, 2011

I haven't used SoX extensively since Mandriva was called Mandrake. It looks like the newest version of SoX has a very good converter. The comparison graphs I just found and linked below show that the different converters seem to optimize for different parameters; Shibatch SSRC appears to focus on passband width, while libsamplerate's Secret Rabit Code has better time domain response. If you're using SoX VHQ, you probably can't do much better, unless you find that its time domain response causes undesirable artifacts.

As for the 16kHz sample, listening to the .m4a files directly did give much better results than playing them through Flash.

Anyway, good luck with ZenCoder.

http://sox.sourceforge.net/SoX/Resampling

http://src.infinitewave.ca/

emrosenf · on July 20, 2011

This is awesome Jon! What audio processing tricks did you use?

jon_dahl · on July 23, 2011

Sorry for the slow response. For the most part, it's not tricks, at lest for now - more like using the best available software for resampling, channel mapping, etc. We don't (for instance) automatically mess with normalization/gain, though users can do that though our API.

jrockway · on July 20, 2011

Can I just have lossless audio, please?

jon_dahl · on July 21, 2011

Sure. FLAC is great, and it's on our roadmap.

We're in a funny place right now where both higher bitrate and lower bitrate video/audio are becoming increasingly important. Low bitrate because of mobile video and adaptive bitrate streaming, and high bitrate because bandwidth is getting cheaper.