Did Facebook ask permission to create derivative works (the bot) from Reddit posts, I wonder, or does this fall under web-scraping law?
If I recall Reddit users still retain rights to their posts unless Reddit the company provides some sort off broad grants?
If they did not, this is an interesting example a company potentially making a great deal of money (if the bot is sold as something) from content that legally belongs to users without compensation. It's one thing if it abides by a site user agreement and users understand once they post it's gone, but to see it happen from a Reddit corpus seems odd.
Shorter version: source data has value and users should share in any value derived from their data if they have the rights to it.
Legally, https://towardsdatascience.com/the-most-important-supreme-co... gives a good example how transformational machine learning classifiers generally fall under fair use. It does raise a good point that generative machine learning, like this, has not been explored legally yet.
This is still research which will likely provide public good if/when they publish results and methods. Probably, they'll do a different dataset for any commercial work given the profanity problem highlighted in the article.
Making or not making money is such a weird way for people to see things. That's part of why I love the Free Software movement so much and abhor the CC-*-NC licences.
Fortunately, Reddit has the exception where they can give out access to anyone they want. But I still think StackOverflow is the gold standard: CC-BY-SA. No restriction on making money. Maybe a platinum standard would be CC-BY.
The point is not about the money - the point is using data contributed by users without the proper license to create something that might yield revenue which will then not be shared or payed forward in any way to the contributors. We have all worked hard to create the data used by companies to sell ads to us and make massive amounts of money. I guess I got a couple gigs of free email? Cool...
I also understand that most apps make us sign our lives away, but if I don't (as in the Reddit case) and I actually have rights to the data I sure as heck don't want that data used ANYWAY to power more of this stuff.
Probably a gross overreaction, but it seems like an externality that we've kinda just accepted as society that I'd like to see change a bit.
In Reddit's case, that's the deal. You get a website to share things on with other people, and the value exchange involves you giving full licence to Reddit and giving relicense rights to Reddit.
Personally, I find that a very fair deal and clearly other people do as well. I think it actually yields positive externalities because we get things that wouldn't exist otherwise because the transaction costs outweigh the value, but the transaction costs are an inherent cost and I don't want to levy them. Fortunately, Reddit gives me the ability to not levy them and to guarantee that I won't levy them.
In fact, this is part of the magic of Free Software: true freedom to use. Yes, Google can use so much work which was done and it doesn't have to pay any of it back to Torvalds or Greg Kroah-Hartman or even me for the minor changes I made to libraries. This is freedom. I prefer it. And fortunately the world is aligned in this direction.
I want to agree with you with 100%, but something is nagging at me a bit. Just like free software that ends up in a paid product and then winning or settling in court because the company has more resources to use the judicial system, when we apply this directly as a societal value this starts to break down in practice.
The freedom you are talking about ends up justifying (in practice) a situation that only provides real freedom for a small few that happened to take advantage early and use other asymmetries in society to consolidate control. Sure, we fix those we're all set! (maybe?)
But until then perhaps we can agree that as a society we expect (and might ask for, by law) a little something extra from companies that have benefitted to help ensure others after them have a chance to use this freedom as well.
My argument is not as well thought out at this point, I grant you. Thanks for providing me with a lot to think about.
Did Facebook ask permission to create derivative works (the bot) from Reddit posts, I wonder, or does this fall under web-scraping law?
If I recall Reddit users still retain rights to their posts unless Reddit the company provides some sort off broad grants?
If they did not, this is an interesting example a company potentially making a great deal of money (if the bot is sold as something) from content that legally belongs to users without compensation. It's one thing if it abides by a site user agreement and users understand once they post it's gone, but to see it happen from a Reddit corpus seems odd.
Shorter version: source data has value and users should share in any value derived from their data if they have the rights to it.