So it is somewhat like a classic random forest or maybe bagging, where you're trying to stop overfitting, but you're also trying to train that top layer to know who could be the "experts" given the current inputs so that you're minimising the number of multiple MLP sublayers called during inference?