Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> GPT-3 was trained on 300 billion tokens of text from the internet and books:

> GPT-3 is 175 billion parameters

Total newbie here. What does these two numbers mean?

If running huge number of texts through BPE, we get a array with length of 300B ?

What's the number if we de-dup these tokens? (size of vocab?)

175B parameters means there are somewhat useful 175B floats in the pre-trained neural network?



I’ll do my best.

Number of params is the number of weights. Basically the number of learnable variables.

Number of tokens is how many tokens it saw during training.

Vocab size is the number of distinct tokens.

The relationship between params/tokens/compute power is something people have studied a good deal and how it affects model performance. https://arxiv.org/pdf/2203.15556.pdf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: