this post was submitted on 20 Mar 2025
63 points (100.0% liked)

Opensource

2239 readers
92 users here now

A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!

CreditsIcon base by Lorc under CC BY 3.0 with modifications to add a gradient



founded 2 years ago
MODERATORS
top 2 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 10 hours ago

and include expensive endpoints like git blame, every page of every git log, and every commit in your repository. They do so using random User-Agents from tens of thousands of IP addresses, each one making no more than one HTTP request, trying to blend in with user traffic.

That's insane. They also mention crawling happening every 6 hours instead of only once. And the vast majority of traffic coming from a few AI companies.

It's a shame. The US won't regulate - and certainly not under the current administration. China is unlikely to.

So what can be done? Is this how the internet splits into authorized and not? Or into largely blocked areas? Maybe responses could include errors that humans could identify and ignore but LLMS would not to poison them?

When you think about the economic and environmental cost of this it's insane. I knew AI is expensive to train and run. But now I have to consider where they leech from for training and live queries too.

[–] [email protected] 7 points 20 hours ago

AI is a cancer on the internet, how many FOSS projects will wither on the vine because dealing with this is too expensive, annoying or difficult?