You’re aware that PDFs are containers that can hold various formats, which can b... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		_boffin_ 8 months ago \| parent \| context \| favorite \| on: PDF to Text, a challenging problem You’re aware that PDFs are containers that can hold various formats, which can be interlaced in different ways, such as on top, throughout, or in unexpected and unspecified ways that aren’t “parsable,” right? I would wager that they’re using OCR/LLM in their pipeline.

andrethegiant 8 months ago [–]

Could be. But their pricing for the conversion is free, which leads me to believe LLMs are not involved.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact