Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We developers seem to really dislike PDFs, to a degree that we'll build LLMs and have them translate it into Markdown.

Jokes aside, PDFs really serve a good purpose, but getting data out of them is usually really hard. They should have something like an embedded Markdown version with a JSON structure describing the layout, so that machines can easily digest the data they contain.



I think you might be looking for PDF/A.

https://www.adobe.com/uk/acrobat/resources/document-files/pd...

For example, if you print a word doc to PDF, you get the raw text in PDF form, not an image of the text.


PDF/A doesn't require preserving the document structure, only that any text is extractable.


> We developers seem to really dislike PDFs, to a degree that we'll build LLMs and have them translate it into Markdown.

Why Jokes aside? Markdown/html is better suited for the web than pdf




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: