> GPT-3 was trained on 300 billion tokens of text from the internet and books: >...

code_runner · on Feb 11, 2023

I’ll do my best.

Number of params is the number of weights. Basically the number of learnable variables.

Number of tokens is how many tokens it saw during training.

Vocab size is the number of distinct tokens.

The relationship between params/tokens/compute power is something people have studied a good deal and how it affects model performance. https://arxiv.org/pdf/2203.15556.pdf