Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So it is somewhat like a classic random forest or maybe bagging, where you're trying to stop overfitting, but you're also trying to train that top layer to know who could be the "experts" given the current inputs so that you're minimising the number of multiple MLP sublayers called during inference?


Yea, it's very much bagging + top layer (router) for the importance score!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: