Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This makes sense. One tweak for the press: I think it would be an improvement to call it OptionalAttention rather than QuietAttention since the goal is to permit an attention head to opt-out.

You might attract more, ahem, attention if it was immediately apparent from the name only what this attention head does that the current one does not. There's also that small matter of distinguishing the internal vs output softmax functions.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: