The number of tokens trained on is separate from the model size. Gemma 3 270M wa...

		acoustics 5 months ago \| parent \| context \| favorite \| on: Google AI Overview made up an elaborate story abou... The number of tokens trained on is separate from the model size. Gemma 3 270M was trained on 6 trillion tokens but can be loaded into a few hundred million bytes of memory. But yeah GPT-4 is certainly way bigger than 45GB.