Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I completely agree with your assessment, but the problem is a bit worse in my opinion. We already have a pretty firm grasp of how different ML systems learn and converge towards a solution in the average case. It's not that we need to understand our neural networks better, it's that we need to understand our problem domain better. We can't determine how well some ML architecture will perform at an object recognition problem without some math describing object recognition. This makes things a lot more complicated, because it means we have to do a lot more work to understand every single application where we want to use ML.

And, of course, if we had some really good mathematical framework for describing and reasoning about object recognition, we probably wouldn't need to turn to ML to solve it ;)



The whole point of Deep Learning is that we don't want to describe math behind object recognition; it was the failed "classical" approach where people spent decades figuring out complex features which worked horribly. Deep Learning is actually pretty simple, well understood and parallelizable, and it's basically a billion-dimensional non-linear optimization. As optimization is infested with NP-hard problems, it's as difficult as it gets. It's actually amazing what we can do with it in the real world right now (and we are still far away from seeing all its fruits). Of course, it would frustrate academics that can't base AGI on top of it, but did they really think this approach would do it anyway?


Deep learning does not seem to abstract very well. Train on a data set then test with images that are simply upside down and the preformance can be significant.

Feature extraction also works much better when you toss a lot of data and processing power behind it. So, a lot of progress is simply more data and computing power vs better approaches. Consider how poorly deep leaning works when using a single 286.


> Deep learning does not seem to abstract very well. Train on a data set then test with images that are simply upside down and the preformance can be significant.

But that's true of people too. How quickly can you read upside-down?

If you trained on a mixture of upside-down and right way up images, and tested on upside-down images, performance wouldn't take that much of a hit.


> But that's true of people too.

Sure, the problem is we are more willing to ignore failures that are similar to how we fail. IMO, when we compare AI approach X vs. Y we need to consider absolute performance not just performance similar to human performance.

Deep learning for example gains a lot from texture detection in images. But, that also makes it really easy to fool.


While I can't easily read upside down text, I can instantly recognize it as not only text, but that it needs to flipped upside down in order to be read. That's something current "deep learning" AIs can't do reliably, if at all.

If I had to describe the root cause of this problem it would be that humans process "problems" rather than "things" and we "learn" by building an ever growing mental library of problem solving algorithms. As we continue to "learn", we refine our problem solving algorithms to be more general than specific. Compare that to a deep learning AI that learns by building an ever greater data library of things while refining algorithms to suit ever more specific use cases.


I think you're describing a level of generalization above the application at hand. We could easily train a neural network to recognize the orientation of a font, and then build an orientation invariant "reading" app by first recognizing the rotation of the text, transforming it so it is right side up, and then recognizing as normal.

I tend to imagine our brains works similarly. It's not that you have a single "network" in your brain that recognizes test from all angle, but your brain is a "general purpose" machine with many networks that work together. I think current deep learning techniques are great for discrete tasks, and the improvement needed is to have many networks that work together properly with some form of intuition as to what should be done with the information at hand.


There is work on rotation invariant CNNs, but I'm not sure why you would expect that property to just fall out of standard CNNs.

As much as architecture research gets denigrated these days, MLPs aren't what set off the revolution.


It's not that we need to understand our neural networks better, it's that we need to understand our problem domain better.

How 'bout "creating models that can work with more dimensions of the problem domain than are conveyed by standard data labeling"?

I mean, we don't simply want AI but actually "need" it in the sense that problems like biological system are too complex to understand without artificial enhancements to our comprehension processes - thus to "understand the problem domain better" we need AI. If it's true that "to build AI, we need to understand the problem domain better", it leaves us stuck in a chicken-and-problem. That might be the case but if we're going find a way out, we are going to need to build tools in the fashion humans used to solve problems many times before.


It will probably play out like a conversation. A data scientist trains an ML model, and in analyzing the results discovers some intrinsic property or invariant of the problem domain. The scientist can then encode that information into the model and retrain. And that goes on and on, each time providing more accurate results.

As an aside, I think it's important that we find a way to examine and inspect how an ML model "works". If you have some neural network that does really well at the problem, it would be nice if you could somehow peer into it and explain, in human terms, what insight the model has made into the problem. That might not be feasible with neural networks, as they're really just a bunch of weights in a matrix, but this is practical for something like decision trees. Just food for thought.


This is somewhat practical for neural networks. For example, instead of minimizing the loss function, why not tweak the input to maximize a neuron’s activation? Or with a CNN, maximize the sum of a kernel’s channel? This would tell us what the neuron corresponds with. This is what Google did with DeepDream.

An explanation/tutorial, with clean images of the process: https://github.com/tensorflow/tensorflow/blob/r0.10/tensorfl...

Google’s investigation of it’s GoogLeNet architecture: http://storage.googleapis.com/deepdream/visualz/tensorflow_i...

Now, I say somewhat because results can be visually confusing, ex Google’s analysis. Even then, we can see the progression of layer complexity as we go deeper into ImageNet. Plus, we can see mixed4b_5x5_bottleneck_pre_relu has kernels that seem to correspond with noses and eyes. mixed_4d_5x5_pre_relu has a kernel that seems to correspond with cat faces.


A data scientist trains an ML model, and in analyzing the results discovers some intrinsic property or invariant of the problem domain. The scientist can then encode that information into the model and retrain. And that goes on and on, each time providing more accurate results.

Mmmaybe,

It's tricky to articulate what pattern the data-scientist could see ... that an automated system couldn't see. Or otherwise, perhaps the whole "loop" could be automated. Or possibly the original neural already finds all the patterns available and what's left can't be interpreted.


The human participant may consider multiple distinct machine results, each a point in the space of algorithm, data set, bias applied to the problem domain. Human intuition is injected into the process and the result will be greater than the sum of the machines and a lone human mind.

What is interesting to note, now that above idea is considered, is that this process model itself belongs to the set of human-machine coordinations. Another process model is where low level human mind is used to perform recognition tasks too hard (or too slow) for machine to perform, for example using porn surfers to perform computation tasks via e.g. captcha like puzzles.

Long term social ramifications of all this is also interesting to consider as it motivates machines to breed distinct types of humans ;)


I imagine you need the data science to discern semantically relevant from irrelevant signals. How else do you “tell” your model what to look for? You could easily train for an irrelevant but fitting model.


It's work remembering that many times progress of held back by ideas that "aren't even wrong." The Perceptrons book wasn't wrong; it just attacked the wrong questions with an inadequate level of certainty in it's assumptions. It may be that we feel that we understand where machine learning is at now, but actually have a huge amount to learn because of inadequacies that we aren't even aware of.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: