I still think the next breakthrough will be when we figure out how to simplify/o...

kylevedder · on Sept 25, 2021

You should look at Frankle's work on The Lottery Ticket Hypothesis [1]; it turns out that in most cases you can remove 80+% of the network's weights and still get very similar output quality. The hypothesized reason is that regions of the network which are randomly initialized end up getting a "winning lottery ticket" which already has structure reasonably well optimized for your end task to be further finetuned, and everything else mostly ends up just being set dressing, which is why you can delete those neurons without a major impact to performance.

That said, I don't think we are going to be able to actually step away from large networks any time soon. It seems to be the case that when you have more parameters to optimize, you have more degrees of freedom and you are less likely to end up getting stuck in a local minima which is why it's actually easier to train a larger network to solve a task versus a smaller network, despite the fact that both are wildly overparameterized.

[1] https://arxiv.org/abs/1803.03635

sam0x17 · on Sept 25, 2021

Very cool, thanks!

singularity2001 · on Sept 25, 2021

I think recursive networks will beat feed forward: figure out a way to feed some outputs of the upper layers back into the lower layers ("fine-tuning/focus")