this post was submitted on 17 Mar 2025
86 points (100.0% liked)

Technology

67242 readers
4932 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 11 points 5 days ago* (last edited 5 days ago) (1 children)

Wow, EFF. You've been a beacon of light in countless fights, but I did a doubletake on this article. Are you really implying that simply being on the internet is subject to business free-for-all?

I had to have read that wrong. It is absolutely the responsibility of any creative business to track and audit all copyrighted works used in deliverables.

AI, being the business of scooping up massive amounts of data, should absolutely have some sort of metadata log referencing copyrighted works. This is not the burden of small business, but standard practice for AI.

*AI is like reading and should be fair use

No, it certainly is not. Creating a compressed efficient database for search engines to reference and point users is fair use. Using that database to generate new work is not. AI is inherently generative.

[–] [email protected] 8 points 5 days ago (1 children)

Spoken like someone who either didn't read the article or has a deep misunderstanding of what AI training is.

[–] [email protected] 7 points 5 days ago* (last edited 5 days ago) (2 children)

Enlighten me. I hope I read it wrong.

It sounds like the EFF is advocating stripping/ignoring copyright information (as is currently done) when generating LLM's to ease burden of small startups tracking down copyright owners. Something I had to do in productions and yeah, it sucked, but it's how it works. (Radio is a tad different)

[–] [email protected] 8 points 5 days ago (1 children)

I recommend reading this article by Cory Doctorow, and this one by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries.

[–] [email protected] 3 points 5 days ago (1 children)

The first article has some good points taken very literally. I see how they arrive at some conclusions. They break it down step by step very well. Copyright is merky as hell, I'll give them that, but the final generated product is what's important in court.

The second paper, while well written, is more of a press piece. But they do touch on one important part relevant to this conversation:

The LCA principles also make the careful and critical distinction between input to train an LLM, and output—which could potentially be infringing if it is substantially similar to an original expressive work.

This is important because a prompt "create a picture of ____ in the style of _____" can absolutely generate output from specific sampled copyright material, which courts have required royalty payments in the past. An LLM can also sample a voice of a voice actor so accurately as to be confused with the real thing. There have been Union strikes over this.

All in all, this is new territory, part of the fun of evolving laws. If you remove the generative part of AI, would that be enough?

[–] [email protected] 7 points 5 days ago (1 children)

The funny part is most of the headlines want you to believe that using things without permission is somehow against copyright. When in reality, fair use is a part of copyright law, and the reason our discourse isn't wholly controlled by mega-corporations and the rich. It's sad watching people desperately trying to become the kind of system they're against.

[–] [email protected] 1 points 5 days ago* (last edited 5 days ago)

Fair use is based on a four-factor analysis that considers the purpose of the use, the nature of the copyrighted work, the amount used, and the effect on the market for the original work.

It is ambiguous, and limited, tested on a case-by-case basis which makes this time in Copyright so interesting.

[–] [email protected] 2 points 4 days ago

It's saying that copyright law doesn't apply to AI training, because none of the data is copied. It's more akin to a person reading an impossible amount at an impossible speed, then using what they read as inspiration for their own writing. Sure, you could ask an LLM trained on, say, Edgar Allen Poe's works to recite the entirety of The Raven, but it can only "recall" similarly to a human, and will have just as many mistakes (probably more, really) in its recitation as a human would.