Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The family of sigmoid functions has nice gradient properties with theoretical backing. Good starting read: https://stats.stackexchange.com/questions/162988/why-sigmoid...


Doesn't really explain why not more typical ReLU + layer norm or some alternative.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: