Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The number of tokens trained on is separate from the model size.

Gemma 3 270M was trained on 6 trillion tokens but can be loaded into a few hundred million bytes of memory.

But yeah GPT-4 is certainly way bigger than 45GB.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: