Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We ran some benchmarks comparing against Gemini Flash 2.0. You can find the full writeup here: https://reducto.ai/blog/lvm-ocr-accuracy-mistral-gemini

A high level summary is that while this is an impressive model, it underperforms even current SOTA VLMs on document parsing and has a tendency to hallucinate with OCR, table structure, and drop content.



Anecdotally, we also found Gemini Flash to be better.


meanwhile, you're comparing it to the output of almost a trillion dollar company


The tagline boasts that it is "introducing the world’s best document understanding API". So, holding them to their marketing seems fair


Isn't anyone who releases anything putting "the world's best blablabla" on their page nowadays? I've become entirely blind to it.


If they put it, and it's subpar, I write off the product.


... And? We're judging it for the merits of the technology it purports to be, not the pockets of the people that bankroll them. Probably not fair - sure, but when I pick my OCR, I want to pick SOTA. These comparisons and announcements help me find those.


comparisons to more outputs coming soon!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: