Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To be less glib, just wait until there are a bunch of papers picking Swiglu over Relu, and then you can stop handwringing. Because it doesn't really matter if there was a super specific concrete well-articulated reason that Swiglu worked well for their specific approach. You're still going to use Relu by default and quickly try Swiglu for now regardless.

It's fine, I waited a bit before default adopting Relu over Tanh for all hidden non-final (not outputting a probability) layers.



Thanks a lot for your explanations :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: