Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It misses the crucial detail that every transformer layer chooses the experts independently from the others. Of course they still indirectly influence each other since each layer processes the output of the previous one.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: