Opinion: The Copyright Office is making a mistake on AI-generated art (arstechnica.com)

submitted 2 years ago by [email protected] to c/[email protected]

150 comments fedilink hide all child comments

I've generally been against giving AI works copyright, but this article presented what I felt were compelling arguments for why I might be wrong. What do you think?

you are viewing a single comment's thread
view the rest of the comments

[-] [email protected] 22 points 2 years ago

The strongest argument against AI art is that it is derivative of the copyrighted art it is based on. A photo of a copyrighted artwork would be similarly difficult to copyright. In this sense, AI art is more akin to music sampling in that it uses original material to make something new -- and to sample music you must ask permission.

[-] [email protected] 8 points 2 years ago* (last edited 2 years ago)

You can't copyright AI-generated art even if it was only trained with images in the public domain.

In fact, you can't copyright AI-generated art even it was only trained with images that you made.

[-] [email protected] 2 points 2 years ago

I think this nails it. It's probably the attack authors will use against OpenAI.

But the copyright office clearly states otherwise, so we're in for a showdown.

Personally, I think the AI stuff seems more akin to writing a book in the style of another author, which is completely legal. And, to be clear, my option has no legal effect here whatsoever. 😅

[-] [email protected] 7 points 2 years ago* (last edited 2 years ago)

There are two separate issues here. First, can you copyright art that is completely AI-generated? The answer is no. So openAI cannot claim a copyright for its output, no matter how it was trained.

The other issue is if openAI violated a copyright. It's true that if you write a book in the style of another author, then you aren't violating copyright. And the same is true of openAI.

But that's not really what the openAI lawsuit alleges. The issue is not what it produces today, but how it was originally trained. The authors point out that in the process of training openAI, the developers illegally download their works. You can't illegally download copyrighted material, period. It doesn't matter what you do with it afterwards. And AI developers don't get a free pass.

Illegally downloading copyrighted books for pleasure reading is illegal. Illegally downloading copyrighted books for training an AI is equally illegal.

[-] [email protected] 2 points 2 years ago

[removed by mod]

[-] [email protected] 2 points 2 years ago* (last edited 2 years ago)

When determining whether something is fair use, the key questions are often whether the use of the work (a) is commercial, or (b) may substitute for the original work. Furthermore, the amount of the work copied is also considered.

Search engine scrapers are fair use, because they only copy a snippet of a work and a search result cannot substitute for the work itself. Likewise if you copy an excerpt of a movie in order to critique it, because consumers don't watch reviews as a substitute for watching movies.

On the other hand, openAI is accused of copying entire works, and openAI is explicitly intended as a replacement for hiring actual writers. I think it is unlikely to be considered fair use.

And in practice, fair use is not easy to establish.

[-] [email protected] 2 points 2 years ago

[removed by mod]

[-] [email protected] 3 points 2 years ago* (last edited 2 years ago)

I know the model doesn't contain a copy of the training data, but it doesn't matter.

If the copyrighted data is downloaded at any point during training, that's an IP violation. Even if it is immediately deleted after being processed by the model.

As an analogy, if you illegally download a Disney movie, watch it, write a movie review, and then delete the file ... then you still violated copyright. The movie review doesn't contain the Disney movie and your computer no longer has a copy of the Disney movie. But at one point it did, and that's all that matters.

[-] [email protected] 1 points 2 years ago

[removed by mod]

[-] [email protected] 2 points 2 years ago

No, it doesn't.

It defends web scraping (downloading copyrighted works) as legal if necessary for fair use. But fair use is not a foregone conclusion.

In fact, there was a recent case in which a company was sued for scraping images and texts from Facebook users. Their goal was to analyze them and create a database of advertising trackers, in competition with Facebook. The case settled, but not before the judge noted that the web scraper was not fair use and very likely infringing IP.

[-] [email protected] 1 points 2 years ago

[removed by mod]

[-] [email protected] 1 points 2 years ago* (last edited 2 years ago)

Yes, it absolutely hinges on fair use. That's why the very first page of the lawsuit alleges:

"Defendants’ LLMs endanger fiction writers’ ability to make a living, in that the LLMs allow anyone to generate—automatically and freely (or very cheaply)—texts that they would otherwise pay writers to create"

If the court agrees with that claim, it will basically kill the fair use defense.

[-] [email protected] 1 points 2 years ago

[removed by mod]

[-] [email protected] 1 points 2 years ago* (last edited 2 years ago)

the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use upon the potential market.

Yes, and I named three of those factors:

the key questions are often whether the use of the work (a) is commercial, or (b) may substitute for the original work. Furthermore, the amount of the work copied is also considered.

And while you don't need to meet all the criteria, the odds are pretty long when you fail three of the four (commercial nature, copying complete work rather than a portion, and negative effect on the market for the original).

Think of it this way: if it were legal to download books in order to train an AI, then it would also be legal to download books in order to train a human student. After all, why would a human have fewer rights than an AI?

Do you really think courts are going to decide that it's ok to download books from The Pirate Bay or Z-Library, provided they are being read by the next generation of writers?

[-] [email protected] 2 points 2 years ago

[removed by mod]

[-] [email protected] 2 points 2 years ago* (last edited 2 years ago)

Again, it's not a question of reproducing books in an LLM. The allegation is that the openAI developers downloaded books illegally to train their AI.

You need to pay for your copy of a book. That's true if you are a student teaching yourself to write, and it's also true if you are an AI developer training an AI to write. In the latter case, you might also need to pay for a special license.

Is it possible that the openAI developers can bring the receipts showing they paid for each and every book and/or license they needed to train their AI? Sure, it's possible. If so, the lawyers who brought the suit would look pretty silly for not even bother to check.

But openAI used a whole lot of books, which cost a whole lot of money. So I wouldn't hold my breath.

[-] [email protected] 2 points 2 years ago

[removed by mod]

[-] [email protected] 1 points 2 years ago* (last edited 2 years ago)

Simple question:

If you are college student, learning to write professionally, is it fair use to download copyrighted books from Z-Library in order to become a better writer? If you are a musician, is it fair use to download mp3s from The Pirate Bay in order to learn about musical styles? How about film students, can they torrent Disney movies as part of their education?

I'm certain that every court in the US would rule that this is not fair use. It's not fair use even if pirated content ultimately teaches a student how to create original, groundbreaking works of writing, music, and film.

Simply being a student does not give someone free pass to pirate content. The same is true of training an AI, and there are already reports that pirated material is in the openAI training set.

If openAI could claim fair use, then almost by definition The Pirate Bay could claim fair use too.

[-] [email protected] 1 points 2 years ago

I'm happy with the illegal downloading being illegal. Where things get murky for me is what algorithms you're allowed to use on the data.

I get the impression that if they'd bought all the books legally that the lawsuit would still be happening.

[-] [email protected] 2 points 2 years ago

If they bought physical books then the lawsuit might happen, but it would be much harder to win.

If they bought e-books, then it might not have helped the AI developers. When you buy an e-book you are just buying a license, and the license might restrict what you can do with the text. If an e-book license prohibits AI training (and they will in the future, if they don't already) then buying the e-book makes no difference.

Anyway, I expect that in the future publishers will make sets of curated data available for AI developers who are willing to pay. Authors who want to participate will get royalties, and developers will have a clear license to use the data they paid for.

[-] [email protected] 2 points 2 years ago

Which works were sampled for this?

[-] [email protected] 4 points 2 years ago

Is that a picture of a straw man?

[-] [email protected] 2 points 2 years ago

It's characters from a popular TV show as knitted figures.

[-] [email protected] 2 points 2 years ago

I bet you could build a machine that could recognize subject matter from photographs of it more feasibly than you could build a machine that recognized training data from output

this post was submitted on 22 Sep 2023

85 points (95.7% liked)

Technology

39559 readers

14 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

[email protected]