Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
samus
on April 17, 2024
|
parent
|
context
|
favorite
| on:
Mixtral 8x22B
It misses the crucial detail that every transformer layer chooses the experts independently from the others. Of course they still indirectly influence each other since each layer processes the output of the previous one.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: