A high level summary is that while this is an impressive model, it underperforms even current SOTA VLMs on document parsing and has a tendency to hallucinate with OCR, table structure, and drop content.
... And? We're judging it for the merits of the technology it purports to be, not the pockets of the people that bankroll them. Probably not fair - sure, but when I pick my OCR, I want to pick SOTA. These comparisons and announcements help me find those.
A high level summary is that while this is an impressive model, it underperforms even current SOTA VLMs on document parsing and has a tendency to hallucinate with OCR, table structure, and drop content.