> Arguably the bigger problem is that many of those datasets e.g. WSJ articles are proprietary and can be exclusively licensed like we've seen recently with OpenAI.
> So we end up with in a situation where competition is simply not possible.
Exactly, and Technofeudalism advances a little more into a new feud.
OpenAI is trying to create its moat by shoring up training data, probably attempting to not allow competitors to train on the same datasets they've been licencing, at least for a while. Training data is the only possible moat for LLMs, models seem to be advancing quite well between different companies but as mentioned here a tidy training dataset is the actual gold.
> So we end up with in a situation where competition is simply not possible.
Exactly, and Technofeudalism advances a little more into a new feud.
OpenAI is trying to create its moat by shoring up training data, probably attempting to not allow competitors to train on the same datasets they've been licencing, at least for a while. Training data is the only possible moat for LLMs, models seem to be advancing quite well between different companies but as mentioned here a tidy training dataset is the actual gold.