Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh yes for the FP8, you will need 500GB ish. 4bit around 250GB - offloading MoE experts / layers to RAM will definitely help - as you mentioned a 24GB card should be enough!


Do we know if the full model is FP8 or FP16/BF16? The hugging face page says BF16: https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct

So likely it needs 2x the memory.


I think it's BF16 trained then quantized to FP8, but unsure fully - I was also trying to find out if they used FP8 for training natively!


Qwen uses 16bit, Kimi and Deepseek uses FP8.


Oh ok cool thanks!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: