I want to rejoice that OCR is now a "solved" problem, but I feel like hallucinat...

qingcharles · on March 7, 2025

It depends on your use-case. For mine, I'm mining millions of scanned PDF pages to get approximate short summaries of long documents. The occasional hallucination won't damage the project. I realize I'm an outlier, and I would obviously prefer a solution that was as accurate as possible.

eMPee584 · on March 7, 2025

possibly doing both & diffing the output to spot contested bits?

spudlyo · on March 7, 2025

that’s my current idea, use two different ocr models and diff the results to spot check for errors. at these prices why not?