this post was submitted on 26 Jul 2023
860 points (96.4% liked)

Technology

68348 readers
3176 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 44 points 2 years ago (6 children)

There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.

Folks, this isn’t a new problem, and it doesn’t need new laws.

[–] [email protected] 57 points 2 years ago (22 children)

It's 100% a new problem. There's established precedent for things costing different amounts depending on their intended use.

For example, buying a consumer copy of song doesn't give you the right to play that song in a stadium or a restaurant.

Training an entire AI to make potentially an infinite number of derived works from your work is 100% worthy of requiring a special agreement. This even goes beyond simple payment to consent; a climate expert might not want their work in an AI which might severely mischatacterize the conclusions, or might want to require that certain queries are regularly checked by a human, etc

[–] [email protected] 2 points 2 years ago* (last edited 2 years ago) (2 children)

Well, fine, and I can't fault new published material having a "no AI" clause in its term of service. But that doesn't mean we get to dream this clause into being retroactively for all the works ChatGPT was trained on. Even the most reasonable law in the world can't be enforced on someone who broke it 6 months before it was legislated.

Fortunately the "horses out the barn" effect here is maybe not so bad. Imagine the FOMO and user frustration when ToS & legislation catch up and now ChatGPT has no access to the latest books, music, news, research, everything. Just stuff from before authors knew to include the "hands off" clause - basically like the knowledge cutoff, but forever. It's untenable, OpenAI will be forced to cave and pay up.

[–] [email protected] 11 points 2 years ago (4 children)

OpenAI and such being forced to pay a share seems far from the worst scenario I can imagine. I think it would be much worse if artists, writers, scientists, open source developers and so on were forced to stop making their works freely available because they don't want their creations to be used by others for commercial purposes. That could really mean that large parts of humanity would be cut off from knowledge.

I can well imagine copyleft gaining importance in this context. But this form of licencing seems pretty worthless to me if you don't have the time or resources to sue for your rights - or even to deal with the various forms of licencing you need to know about to do so.

load more comments (4 replies)
load more comments (1 replies)
load more comments (21 replies)
[–] [email protected] 16 points 2 years ago (1 children)

This is a little off, when you quote a book you put the name of the book you’re quoting. When you refer to a book, you, um, refer to the book?

I think the gist of these authors complaints is that a sort of “technology laundered plagiarism” is occurring.

load more comments (1 replies)
[–] [email protected] 15 points 2 years ago (1 children)

I asked Bing Chat for the 10th paragraph of the first Harry Potter book, and it gave me this:

"He couldn’t know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: ‘To Harry Potter – the boy who lived!’"

It looks like technically I might be able to obtain the entire book (eventually) by asking Bing the right questions?

[–] [email protected] 4 points 2 years ago* (last edited 2 years ago) (2 children)

Then this is a copyright violation - it violates any standard for such, and the AI should be altered to account for that.

What I’m seeing is people complaining about content being fed into AI, and I can’t see why that should be a problem (assuming it was legally acquired or publicly available). Only the output can be problematic.

[–] [email protected] 5 points 2 years ago (1 children)

No, the AI should be shut down and the owner should first be paying the statutory damages for each use of registered works of copyright (assuming all parties in the USA)

If they have a company left after that, then they can fix the AI.

[–] [email protected] 8 points 2 years ago

Again, my point is that the output is what can violate the law, not the input. And we already have laws that govern fair use, rebroadcast, etc.

[–] [email protected] 4 points 2 years ago

I think it's not just the output. I can buy an image on any stock Plattform, print it on a T-Shirt, wear it myself or gift it to somebody. But if I want to sell T-Shirts using that image I need a commercial licence - even if I alter the original image extensivly or combine it with other assets to create something new. It's not exactly the same thing but openAI and other companies certainly use copyrighted material to create and improve commercial products. So this doesn't seem the same kind of usage an avarage joe buys a book for.

[–] [email protected] 9 points 2 years ago (9 children)

However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

It's an algorithm that's been trained on numerous pieces of media by a company looking to make money of it. I see no reason to give them a pass on fairly paying for that media.

You can see this if you reverse the comparison, and consider what a human would do to accomplish the task in a professional setting. That's all an algorithm is. An execution of programmed tasks.

If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I'd get my ass sued. I have to buy the books and the scientific papers. STEM companies regularly pay for access to papers and codes and standards. Why shouldn't an AI have to do the same?

[–] [email protected] 10 points 2 years ago (2 children)

If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers.

Well, if OpenAI knowingly used pirated work, that's one thing. It seems pretty unlikely and certainly hasn't been proven anywhere.

Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it's hard to make the case that they're really at fault any more than Google would be.

[–] [email protected] 2 points 2 years ago (2 children)

well no, because the summary is its own copyrighted work

[–] [email protected] 2 points 2 years ago* (last edited 2 years ago)

The published summary is open to fair use by web crawlers. That was settled in Perfect 10 v Amazon.

load more comments (1 replies)
[–] [email protected] 1 points 2 years ago (2 children)

Haven't people asked it to reproduce specific chapters or pages of specific books and it's gotten it right?

load more comments (2 replies)
load more comments (8 replies)
[–] [email protected] 8 points 2 years ago (1 children)

There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

That's part of the allegation, but it's unsubstantiated. It isn't entirely coherent.

[–] [email protected] 3 points 2 years ago (2 children)

It's not entirely unsubstantiated. Sarah Silverman was able to get ChatGPT to regurgitate passages of her book back to her.

[–] [email protected] 3 points 2 years ago

Her lawsuit doesn't say that. It says,

when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works

That's an absurd claim. ChatGPT has surely read hundreds, perhaps thousands of reviews of her book. It can summarize it just like I can summarize Othello, even though I've never seen the play.

[–] [email protected] 2 points 2 years ago

I don't know if this holds water though. You don't need to trail the AI on the book itself to get that result. Just on discussions about the book which for sure include passages on the book.