Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have no actual info on this, but I always assumed they'd compute some mutlimodal embeddings of the screenshots to then retrieve semantically-relevant ones by text? And yeah, they'd have to do it using on-device models, which doesn't seem out of reach?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: