Gemma 3 270M was trained on 6 trillion tokens but can be loaded into a few hundred million bytes of memory.
But yeah GPT-4 is certainly way bigger than 45GB.
Gemma 3 270M was trained on 6 trillion tokens but can be loaded into a few hundred million bytes of memory.
But yeah GPT-4 is certainly way bigger than 45GB.