Hacker Newsnew | past | comments | ask | show | jobs | submit | DarkMeld's commentslogin

That's totally awesome and completely fine in my book. Props (Cheers!) to whoever thought of the idea.


Stay humble my friend.


What does it mean when a codec is "lossless"?


This is in no way rigorous, but hopefully the analogy helps.

Imagine you want to encode something like y = 2 * sin(2 * x) + .1 * sin(10 * x) to save space.

http://www.wolframalpha.com/input/?i=sin%28x%29+%2B+.1*sin%2...

That is composed of two different sine waves added together - one low frequency, large amplitude and one high frequency low amplitude. That can be encoded two different ways - lossy or lossless.

A lossy encoder gets rid of unimportant data and keeps a "close enough" representation of the original. In this example, .1 * sin(10 * t) is a minor component of the overall signal (plot them separately if you need to compare the difference) so the encoder chooses to delete it and only save 2 * sin(2 * t). In a real world sound, this would be like throwing away noise that is so high pitched we can't hear it, or is so quiet we can't detect it. A real encoder has to decide what the "small enough it can be deleted" threshold is, and it's rarely black and white.

A lossless encoder looks at that signal and thinks "there has got to be a more efficient way to store that data". They are both sine functions, so that doesn't need to be repeated twice. "x" is completely unnecessary, because it knows the sine functions are dependent on some variable. So, it could write out something like "sin (2,2) (.1,10)". All of the original data is still there, if whoever receives the data (the decoder) knows how to interpret it.


Note: A fourier transform is never lossless - it's an approximation (usually a fairly good one though).


Not necessarily true. If you have a particular bitstream that approximates a waveform, and you use a fourier transform to generate an approximation with sufficient accuracy that it reproduces the same bitstream, you have something lossless.

Also, even if you don't go that far, many lossless algorithms operate by first providing an approximation and then including the (usually small) deltas between the approximation and the actual bitstream.


The Fourier transform itself is an exact mathematical transformation with an exact inverse transform. There's nothing lossy about it.


Mathematically, you are correct. Practically you are not if you consider the transform source.


What do you mean? If the original source is analog and sampled below its Nyquist rate in the analog-to-digital conversion, the process is indeed irreversible. But that all happens before any transforms from the time domain to the frequency domain are in play, so it's a separate issue.

Beyond that, discrete Fourier and cosine transforms as usually implemented are not fully reversible because of loss in precision. A colleague of mine blogged about the issue in the context of Haar transforms a few years ago: http://cbloomrants.blogspot.com/2008/09/09-08-08-1.html. By decomposing an orthogonal transform into shears as explained by Charles, you can design reversible fixed-precision variants of the DCT like binDCT: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.8...


Sorry it is a separate issue - I should have been more clear.

The original point the OP made was a bad example. Unfortunately I made a poor attempt at explaining that.


That's the principle, but not how it really works. As you said, it's only an analogy, but one that comes close to sounding like the real thing and is as such potentially confusing.

Audio codecs don't try to find functions that describe waves, they sample the waves.

What you get from a microphone or an electric guitar is a changing voltage, a wave of some sort. To digitize that wave (i.e. to save it using numbers and not vinyl discs or magnetic tape) you take samples (i.e. measure the voltage) at regular intervals. You typically might do that 48k times or 96k times per second (i.e. have a sampling frequency of 48kHz or 96kHz).

What you do then with those samples is quantize them, i.e. they get an integer depinding on how high or low the voltage is, for example a 16 Bit number.

That's how an analog-digital converter works on a basic level (you also need an filter to get rid of all the frequencies you don't want to digitize). This process is of course lossy. It's not possible to recontruct the wave that was digitized perfectly, it's only an approximation. All analog-digital converters are lossy (in some sense of the word). If you sample at 48kHz all frequencies higher than half of that (24kHz) are irrevocably lost (look up the Nyquist-Shannon Sampling Theorem if you want to know more). Quantization also means that you have to round the voltage you are measuring up or down, the audible effect this has on the sound is that it adds noise.

The parameters are, however, usually chosen in such a way that a human ear can't tell the difference. A healthy and young ear can hear frequencies up to about 20kHz, that's why the music on CDs is sampled at 44.1kHz (40kHz with some safety margin).

But that's only the conversion and it has not much to do with lossy codecs.

A lossless codec takes the result of this digitization (let's say 48k 16Bit numbers for a one second sound) and encodes them in such a way that those exact same 48k 16Bit numbers can be reconstructed after decoding.

Re-encoding a sound file with a lossless codec can consequently still be lossy: If you change the parameters (e.g. you half the sampling rate) you are still losing imformation - but it's at least possible to preserve the result of the original digitization perfectly.

A lossy codec doesn't aim for perfect reconstruction of all the numbers. It uses quirks in human hearing to be more or less precise about how those numbers are reconstructed. For example: A really loud sound masks quieter sounds. It's not possible for humans to hear that quieter sound, so why bother with all that precision for that quiet sound? That's where, for example, MP3 can dynamically decrease the quality. It's quite ingenious, really.

It is of course inevitable that during that process information is lost. It's not possible to get those original 48k numbers back, but at least something that sounds to an human ear more or less like them.


> that's why the music on CDs is sampled at 44.1kHz (40kHz with some safety margin)

A nice explanation of the 44.1kHz is here: http://www.cs.columbia.edu/~hgs/audio/44.1.html

Initially digital video tracks were used and as such they were constrained by the video formats commonly recorded on them and the playback devices coming with that.


It means that there's no loss of fidelity. If you convert raw audio (a WAV for instance) to MP3, vorbis, or other lossy formats and then convert it back, you'll end up with different (lower fidelity) data. Doing the same with FLAC or another lossless format gives you the exact same data back.

In the image space, PNGs are likewise lossless, whereas JPEG causes a loss of fidelity.


Another word for lossless is "reversible", i.e. you can get back out what you put in, which isn't the case with mp3, but is with zip or flac. If you don't worry about it being bit exact you can make it a fair bit smaller though, particularly if any of the original data is difficult/impossible to hear.


Think of more like RAR or ZIP compression: filesize it reduced, but no data is lost.

Unlike MP3 which uses a number of lossy compression techniques, including psychoacoustics, that are specifically designed to reduce the bitrate of an audio stream with minimal artifacts.


Lossless: zip/unzip.

Lossy: jpeg.


Or:

Lossless: PNG

Lossy: JPEG


"No one likes Photoshop..."

I fucking love Photoshop


I have a really hard time getting along with Photoshop as a web design tool. For touching up photos it's fantastic, but I far prefer Illustrator, or even better, Fireworks. Fireworks in particular has a great vector/raster hybrid that I love.

But I'm sure I would get used to Photoshop if I had to.


I’ve become used to Photoshop because I’ve had to (moved to a different company) but it’s seriously not as good as Fireworks for web.


I think the coolness about hating Photoshop comes from Photoshop on Mac being a 2nd class citizen. The Windows version of Photoshop is better in every way. It is very stable and has an excellent UI (though a few of the older dialogs could use an update).

The Mac version on the other hand is a pain in the ass to use. I operate at about 70% efficiency when using the Mac over Windows version.


Do you have any specifics? The UIs are very similar. They don't use the OS widgets on either platform to keep things consistent and if you know one you know the other. So much so that I'm not a fan of the Mac UI too much because it feels like a Windows app.


That is interesting, its the first time I've heard that. I've been using Photoshop on Windows since at least 1999 and can not imagine life without it. It never crashes (everything else I use does.)


Same here. I'm thankful for the brains that went into developing such a tool, just think about the amount of research, mathematics, and algorithms required to do even some of the simple tasks in Photoshop.


Having to loop through arrays to translate data into html elements is the majority of the problem. I'd be happy with just a solution to that.


I'm not sure of the accuracy of complete's site analytics.


I'm absolutely sure they're woefully inaccurate. We've compared their data to our own GA data and their numbers are so far off it's laughable - or it would be if we weren't spending an arm and a leg to get them.


I have compared Compete's numbers to actual data from about 150 sites.

On that sample the numbers were never higher than actual and on average the Compete numbers were 2-3x lower than the actual numbers.

So I have used it as a reasonable estimate for half of the traffic a site sees. Although I may be placing my foot in my mouth because the stats for HN don't quite seem accurate: http://siteanalytics.compete.com/news.ycombinator.com/


Some of my sites receive 10 to 100 times more traffic than reported by compete.


Are they sites with less than 20,000 uniques per month? I've found those to be the outliers in my data set. They're wildly off on "low" traffic sites but have been a decent yardstick for sites getting more than 20k uniques in my limited experience.


No, more like 50,000-150,000 (depending on the month).


compete is a trainwreck. i wouldn't trust it, even for comparative analysis, at least based on the sites i've sampled


Would you mind expanding on this? Why do you think is a train wreck and where would you recommend to check for analytics?

Thanks!


I just noticed how incredibly fast Facebook search is


It's not that fast, Facebook actually starts the search query from the second letter (about 174 ms), before that, it uses the prefetched cache of first degree friends and apps.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: