You’re aware that PDFs are containers that can hold various formats, which can be interlaced in different ways, such as on top, throughout, or in unexpected and unspecified ways that aren’t “parsable,” right?
I would wager that they’re using OCR/LLM in their pipeline.
I would wager that they’re using OCR/LLM in their pipeline.