I agree in that a perfectly consistent dataset won't completely stop *statistica...

rtkwe · on July 18, 2024

I think the problem is you need an extremely large quantity of data just to get the machine to work in the first place. So much so that there may not be enough to get it working on just "quality" data.

antihipocrat · on July 19, 2024

How would confidence scores work? Multiple passthroughs and a % attached to each statement according to how often it appeared in the generated result?

If so, building this could be quite complex depending on the domain. In the legal field even one simple word that is changed can have large consequences.

bbor · on July 19, 2024

What’s a non-statistical language model?

And I think looking to the training data for sources is a little silly - that’s the training data for intuitive language use, not true statements about the world. If you haven’t checked it out yet, two terms you’d love are “RAG” and “Manuel De Landa”