> No explicit content (that could be plagiarized or combined) is actually stored...

fjkdlsjflkds · on Nov 30, 2022

> When people ask GPT-3 to write a rhyming poem, you see plenty of examples of GPT-3 poems starting with "There once was a cat named Pat..." This is an extremely common first line of a limerick, found anywhere from here[1] to here[2].

Knowing that a "rhyming poem" is likely to start with a specific token (or set of tokens) does not exactly constitute "plagiarism", the same way that writing a poem that starts with "There once was a cat named Pat..." is not "plagiarism" by itself: it is just adhering to expected convention/norms of a specific literary format or genre.

Is using the basic-ass I-V-VI-IV chord progression in music "plagiarism", since it has been (and is) used by countless other people before?

> I'm sure those "statistical relationships" are very strong; is it plagiarism? I'll leave it up to you to decide that

Well... my claim is that it is clearly not plagiarism (and I gave specific arguments to support my claim). If you are not interested in arguing (which is fine), then I assume you accept that your characterization of what GPT3 does as being "plagiarism" is (at the very least) overly simplistic (i.e., just as simplistic as claiming that GPT-3 is sentient or actually intelligent).

> but I'm willing to bet that with enough finagling you can probably get it to spit out phrases from Moby Dick.

If GPT-3 (or most humans, for that matter) are asked to complete the phrase "To be or not to..." and decide that the word "be" is the most likely/reasonable completion, does it mean that GPT-3 (or any human, for that matter) is "plagiarizing" Shakespeare? Or does it simply mean that they are trying to address your question/problem to the best of their capabilities (and that they probably have read a passage or two of Shakespeare before, or someone paraphrasing Shakespeare)? In other words, just because you can force GPT-3 to output a specific copyrighted work (or an excerpt of it) still doesn't mean that what GPT-3 is doing should be characterized as "plagiarism".

Again, for something to technically count as "plagiarism", it is required that someone (i.e., not a computer program) try to pass off (incorrectly) something as original, own work, which does not seem to be the case here. That was my main point.

EDIT: if you want to be derisive of things like GPT-3, while still being accurate, it makes more sense to say things like "it is simply imitating" or "has no actual creativity" (which seem defensible to me) than things like "it is literally plagiarizing and copying what it saw before" (which seems much less defensible/accurate).

dvt · on Nov 30, 2022

> Is using the basic-ass I-V-VI-IV chord progression in music "plagiarism", since it has been (and is) used by countless other people before?

This is not at all what's happening here. Complete red herring.

> If GPT-3 (or most humans, for that matter) are asked to complete the phrase "To be or not to..." and decide that the word "be" is the most likely/reasonable completion, does it mean that GPT-3 (or any human, for that matter) is "plagiarizing" Shakespeare?

The short answer is yes (imho), but let me put it this way: does GPT-3 know that when it's regurgitating "to be or not to be" it's actually regurgitating Shakespeare? My argument is that no, it does not know, precisely because it thinks this just happens to be a very strong statistical correlation of stringing words together. When, in fact, it's a very famous phrase by a very famous person. So, in a way, it's "accidentally" plagiarizing, but plagiarizing nonetheless. Like if, for whatever reason, I had heard the phrase "it was the best of times, it was the worst of times" somewhere, but couldn't remember where, my ignorance doesn't preclude me from technically plagiarizing Charles Dickens if I blatantly reused the phrase without attribution.

> things like "it is literally plagiarizing and copying what it saw before" (which seems much less defensible/accurate).

This is literally what it's doing, though, under the guise of "statistical correlation." In fact, I've read reports of people using GPT-3-adjacent models that needed to add specific filtering out of training data.

fjkdlsjflkds · on Nov 30, 2022

> Plagiarism is the *fraudulent representation* of another person's language, thoughts, ideas, or expressions *as one's own original work*.

Source: https://en.wikipedia.org/wiki/Plagiarism

The same way that you disagree when someone stretches the meaning of the work "talk" to encompass what GPT-3 does, I also disagree when you try to stretch the meaning of the word "plagiarism" to encompass what GPT-3 does (and I've explained exactly why: GPT-3 generates sequences of tokens, but makes no specific claim about the originality of the generated sequences of tokens).

We can agree to disagree, if you can't accept that "plagiarism" literally involves more than just "copyright infringement" or "replicating someone else's work from statistical correlations" or anything along those lines: it must also involve fraud or some other form of misrepresentation.

Either way, have a nice day.