Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very cool. For the INT4 QAT model, what is the recommended precision for the activations and for the key and values stored in KV cache?


For keys, you probably want to use at least q5 or q6, for values q4 is fine




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: