this post was submitted on 21 Aug 2023
644 points (95.4% liked)

Technology

71843 readers
4099 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

The US Copyright Office offers creative workers a powerful labor protective.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 50 points 2 years ago (7 children)

In my opinion, the copyright should be based on the training data. Scraped the internet for data? Public domain. Handpicked your own dataset created completely by you? The output should still belong to you. Seems weird otherwise.

[–] [email protected] 9 points 2 years ago (1 children)

totally. and if scraped, they must be able to provide the source. I don't care if it costs them money/compute time. They are allowed to grow with fake money after all

[–] [email protected] 9 points 2 years ago

The issue here is if you'd need to prove where your data came from. So the default should be public unless you can prove the source of all the training data

[–] [email protected] 7 points 2 years ago

Scraped the internet for data? Public domain.

Of course, just because material is on the internet does not mean that material is public domain.

So AI is likely the worst of both worlds: It can infringe copyright and the publisher be held liable for the infringement, but offers no protection in and of itself down the line.

[–] [email protected] 6 points 2 years ago (1 children)

I think the next big thing is going to be proving the provenience of training data. Kinda like being able to track a burger back to the farm(s) to prevent the spread of disease.

There was an onlyfans creator on a chat group for one of the less restricted machine learning image generators a while ago.
They provided a load of their content, and there was a cash prize for generating content that was indistinguishable from them.
Provided they were sure that the dataset was only their content, they might be able to claim copyright under this.

[–] [email protected] 3 points 2 years ago

That's not the take (although in a sense I agree training data should influence it especially if it materially reproduce training samples)

Instead the argument is that the individual outputs from ML can only be copyrighted if they carry a human expression (because that's what the law is specifically meant to cover), if there's creative height in the inputs to it resulting in an output carrying that expression.

Compare to photography - photographs aren't protected automatically just because a button is pressed and an image is captured, rather you gain copyright protection as a result of your choice of motive which carries your expression.

Too simple prompts to ML models would under this ruling be considered to be comparable to uncopyrightable lists of facts (like a recipe) and thus the corresponding output is also not protected.