Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wrt. language models/transformers, the neural engine/NPU is still potentially useful for the pre-processing step, which is generally compute-limited. For token generation you need memory bandwidth so GPU compute with neural/tensor accelerators is preferable.


I think I'd still rather have the hardware area put into tensor cores for the GPU instead of this unit that's only programmable with onnx.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: