Hacker Newsnew | past | comments | ask | show | jobs | submit | vedant's commentslogin

The title of this article feels like "has electricity killed oil lamps"?


There is absolutely no way to achieve the impact of Henry Ford without actively trying extremely hard to be the next Henry Ford.


why does he need to be? just relax Sam


From the paper:

"All models still underfit WebText and held-out perplexity has as of yet improved given more training time."


I'm thankful for all the extremely improbable things that had to happen for me to have the experience I'm having.

I am thankful

- To have been born to loving, educated parents

-To live in a time and place with economic and social liberty

- To have a loving and growing group of friends and family around the world who live interesting and inspiring lives

-To live in a time where so many of us can travel at the speed of sound and communicate at the speed of light

-To be able to ask any question, purchase any common object, or want to see any loved one face-to-face, and have my wish fulfilled by a vast global network of machines that do our bidding

-For the millions of seekers, of wise men and women, who created wonders from the energy of the sun and the materials of the Earth, and showed us the way through the darkness

-That after 1000 trillion creatures and eons of failed attempts, one particular hairless ape on one particular wet rock evolved a 3 pound mass of flesh into the most sophisticated computational apparatus in the universe

- That it feels like anything at all to be thinking meat.


Will people never take Betteridge's law of headlines seriously?

https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...


This is a heap of nonsense. Not only does this summarily dismiss the enormous challenges in digital signal processing required for removing arbitrary background audio, it exposes some confusion associated with the ideas of correlated random variables, inner products, and affine transformations.


> it exposes some confusion associated with the ideas of correlated random variables, inner products

Read this article, then come back: [1]

[1] https://en.wikipedia.org/wiki/Cross-correlation


This has the smell of a comment written by someone with limited real-world experience. Simply writing down the list of problems you would have to solve to build Shazam would take an entire afternoon.

Yes, deep neural networks have proven remarkably useful for machine perception, but you would still need to collect a colossal amount of audio data, fingerprint all of it, build a low-latency processing infrastructure for making inferences, and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.


> and convince a hundred million people to install your software to feed you copious real-world training data that you can use to improve model performance.

That's actually the easy part. You already have the music. Distorting it by superimposing background noise is really not difficult.


Lol. When you superimpose noise, the original data is still there. When you have a FM radio playing staticky, heavily compressed music through crappy speakers in an acoustically terrible store and being captured by a terrible microphone and then being compressed, a significant amount of nonlinear distortion has taken place. That is extremely hard to model. And you would have to model it or have real data to train a neural network. Neural networks are extremely hard to train without excellent data.


> playing staticky, heavily compressed music through crappy speakers in an acoustically terrible store

I think you just confirmed how easy (and cheap) it is to actually generate this data.


If by "model" you mean reproduce, why is it so hard to simulate the distortion introduced by poor FM reception? Or by the acoustics of a store?


I wonder if you could build a neural network that would do that...


The easy part? If you're apple, google, or microsoft, maybe.


Can you explain?

I mean, you can easily find thousands of hours of music online. Recording background noise is easy (just go to a random bar where they are not playing music). Now simply add the two signals (you can shift them randomly to generate more data). You can also add some linear filtering if you like (just imagine random settings of an equalizer for starters).

This should give you enough data to build a proof of concept at least.


For training:

Illegally grabbing thousands of hours of music to train a commercial model hardly qualifies as fair use. Any company you build upon that would be tainted.

For sustaining:

In addition, you'll need to keep an updated catalog of music to identify new songs against, and most uses of a service like shazam are to find names of songs people aren't familiar with, so that catalog needs to be very fresh.

That means you'll have to grab some sort of feed, and engage in large scale music piracy for commercial gain or have access to a library of songs from many disparate music providers, such as ascap.

Background noise:

there are literally hundreds of different background noise environments you need to train against. Dozens of common microphone configurations. Clipping, variations.

It's very much a problem where a proof of concept is neat but doesn't really get you anywhere.


Also, I'm not saying it's impossible or not worth doing (obviously, it's possible and worth doing), just that a few minutes of thinking and hacker news comments are going to hardly touch the breadth of difficulties required to get this to work even somewhat reliably.


> use to improve model performance.

Shazam doesn't actually let you improve the answer, nor report incorrect guess. They are so confident with them, even if it's sometimes completely missed genre and style of music.


Alternatively, it helps illustrate the complexity of the problems solved by natural selection over long periods of iteration.


Yeah. You (I assume, unless you are a dog on the internet) and I (definitely not a dog) are the result of something on the order of billions of iterations of the "grasping hand" problem.


One reason is that the amount of training data is many many orders of magnitude smaller.

FWIW it seems the structure you're talking about exploiting is at a morphological and syntactic level, which modern language models tend to effectively handle. Semantics are a much harder problem.


Theravada Buddhism.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: