Am I understanding this right? Surely, I must be missing the entire point becaus...

nil-sec · on March 8, 2020

I think you do misunderstand. They do not add “correlated variables” to a model. The idea is that if you have an overparameterised model for a specific problem, this model contains a smaller model, that has similar performance to the trained large model, without training! That means gradient descent is in fact equivalent to pruning weights in a random network. There is no algorithm for how to do this efficiently (as they show) but that does not mean that there are no (so far unknown) heuristics out there that would get you close. This is exciting as it means a potential alternative for backprop is out there. This would be cool because it might mean more efficient algorithms and something I haven’t seen mentioned in the paper, an alternative to backprop that might be easier to understand in a biologically plausible way.

bonoboTP · on March 8, 2020

I think you misunderstand. Especially

> this model contains a smaller model, that has similar performance to the trained large model, without training

The point is the opposite. There is a small net X within big net Y, such that training only X gives the same performance as training all of Y.

nil-sec · on March 8, 2020

What you are stating is the original Lottery Ticket Hypothesis. What they prove in this paper is the stronger version, empirically noticed here https://arxiv.org/abs/1905.01067 and referred to as "supermasks". To quote from the paper posted here: "within a sufficiently overparameterized neural network with random weights (e.g. at initialization), there exists a subnetwork that achieves competitive accuracy".

Edit: See also https://arxiv.org/abs/1911.13299

bonoboTP · on March 9, 2020

Seems like a "Library of Babel" type of thing. I'd have to read the full paper for how they find the subnets, but their mere existence is not so surprising. There's a huge sea of possible subnetworks. Basically SGD is replaced by whatever procedure you use to traverse the space of parameter subsets. Definitely interesting direction.