I developed this originally for llamafile, which was included in the last two re...

breakingcups · on May 15, 2024

Is that just because nobody has made an effort yet to port them upstream, or is there something inherently difficult about making those changes work in llama.cpp?

leblancfg · on May 16, 2024

I get the impression most llama.cpp users are interested in running models on GPU. AFAICT this optimization is CPU-only. Don't get me wrong – a huge one! – and opens the door to running llama.cpp on more and more edge devices.

pclmulqdq · on May 16, 2024

Wait, computing SiLU directly using some numerical analysis is probably a lot faster than doing an exp each time. Is there a significant perf impact to doing this?

jart · on May 16, 2024

With expf() most of the work had already been done and I could kill two birds with one stone. If you want to do the math for doing SiLU directly, that'd be an awesome change I'd happily merge into llamafile. You might even be able to get that into PyTorch and even bigger name projects.