Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it's BF16 trained then quantized to FP8, but unsure fully - I was also trying to find out if they used FP8 for training natively!


Qwen uses 16bit, Kimi and Deepseek uses FP8.


Oh ok cool thanks!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: