Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I still think the next breakthrough will be when we figure out how to simplify/optimize inner portions of feedforward networks. I think it is extremely likely from working with deep nets in the past that a lot of the inner structure ends up being superfluous. The best way to test this is take a well trained network and then remove a neuron, then re-train until the prior accuracy is achieved, and repeat this process until the previous accuracy cannot be achieved. At this point you have a network that is theoretically as simple as it can be without an accuracy loss. This won't work for the large nets they are talking about in the article since training takes days and uses an inordinate amount of compute resources. So the real breakthrough will be when we come up with some mathematical technique (in my mind something almost analogous to AVL rotations) that yields a bunch of structural simplifications you can apply to inner structures within these nets, turning a network with thousands of weights into a network with hundreds of weights.


You should look at Frankle's work on The Lottery Ticket Hypothesis [1]; it turns out that in most cases you can remove 80+% of the network's weights and still get very similar output quality. The hypothesized reason is that regions of the network which are randomly initialized end up getting a "winning lottery ticket" which already has structure reasonably well optimized for your end task to be further finetuned, and everything else mostly ends up just being set dressing, which is why you can delete those neurons without a major impact to performance.

That said, I don't think we are going to be able to actually step away from large networks any time soon. It seems to be the case that when you have more parameters to optimize, you have more degrees of freedom and you are less likely to end up getting stuck in a local minima which is why it's actually easier to train a larger network to solve a task versus a smaller network, despite the fact that both are wildly overparameterized.

[1] https://arxiv.org/abs/1803.03635


Very cool, thanks!


I think recursive networks will beat feed forward: figure out a way to feed some outputs of the upper layers back into the lower layers ("fine-tuning/focus")




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: