Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think gpt4o is probably doing some ocr as preprocessing. It's not really controversial to say the vmls today don't pick up fine grained details - we all know this. Can just look at the output of a vae to know this is true.


If so, it's better than any other ocr on the market.

I think they just train it on a bunch of text.

Maybe counting squares in a grid was not probably considered important enough to train for.


Why do you think it's probable? The much smaller llava that I can run in my consumer GPU can also do "OCR", yet I don't believe anyone has hidden any OCR engine inside llama.cpp.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: