Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When I read "from scratch", I assume they are doing pre-training, not just finetuning, do you have a different take? Do you mean it's normal Llama architecture they're using? I'm curious about the benchmarks!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: