zutto

joined 1 year ago
[–] [email protected] 40 points 2 days ago (3 children)

It's supposed to be a dot (.) character. The project's name is n.eko.

[–] [email protected] 18 points 2 months ago

They're working on no-js support too, but this just had to be put out without it due to the amount of AI crawler bots causing denial of service to normal users.

[–] [email protected] 17 points 2 months ago (1 children)
  1. Doesn't run against Firefox only, it runs against whatever you configure it to. And also, from personal experience, I can tell you that majority of the AI crawlers have keyword "Mozilla" in the user agent.

  2. Yes, this isn't cloudflare, but I'm pretty sure that's on the Todo list. If not, make an issue to the project please.

  3. The computational requirements on the server side are a less than a fraction of the cost what the bots have to spend, literally. A non-issue. This tool is to combat the denial of service that these bots cause by accessing high cost services, such as git blame on gitlab. My phone can do 100k sha256 sums per second (with single thread), you can safely assume any server to outperform this arm chip, so you'd need so much resources to cause denial of service that you might as well overload the server with traffic instead of one sha256 calculation.


And this isn't really comparable to Tor. This is a self hostable service to sit between your web server/cdn and service that is being attacked by mass crawling.

[–] [email protected] 37 points 2 months ago* (last edited 2 months ago) (21 children)

Yes, Anubis uses proof of work, like some cryptocurrencies do as well, to slow down/mitigate mass scale crawling by making them do expensive computation.

https://lemmy.world/post/27101209 has a great article attached to it about this.

--

Edit: Just to be clear, this doesn't mine any cryptos, just uses same idea for slowing down the requests.

 

I just started using this myself, seems pretty great so far!

Clearly doesn't stop all AI crawlers, but a significantly large chunk of them.

[–] [email protected] 3 points 2 months ago

And replaced the word "AI" with "Apple". ( ͡° ͜ʖ ͡°)

[–] [email protected] 5 points 2 months ago (1 children)

Umm, that is quite literally hallucinations what you are describing? Am I missing something here?

[–] [email protected] 3 points 2 months ago (3 children)

All models hallucinate, it's just how language models work.

Do you have sources for this claim that Mistral's models are trying to deceive anyone?

[–] [email protected] 1 points 3 months ago

In general, to everyone who finds Yacy as an interesting project, just give it a try!

It's relatively light weight, and having millions of pages indexed does not take that much disk space, in my case: 3.5 million indexed pages is around 200 gigabytes only.

Yacy is far from perfect, and it's an ancient project. But it's still alive and kicking strong!

[–] [email protected] 5 points 3 months ago (2 children)

Hi!

I've been selfhosting Yacy for some years, even tho I rarely use it (I'm mostly using Kagi these days).

But some tips:

  • Set up something like this to your browser, this sends Yacy to crawl pages that you visit https://github.com/JeremyRand/YaCyIndexerGreasemonkey .
  • Get familiar with blacklists and try to find some public ones to filter out bad sites and adult content.
  • Tinker with Ranking & heuristics -> Solr boosts to get results that fit your use case more.
  • And in general, tinker with all the settings you can find!

And not directly Yacy related, but you can use your own Yacy through Searxng as well, even in 'private'(non P2P) mode.

[–] [email protected] 79 points 3 months ago (11 children)

Prediction: This change comes to life, people make an uproar about this. Then they forget this in a few days and continue using reddit.

This same old keeps happening with reddit, Twitter/X, etc.

Hopefully we do receive some refugees to Lemmy!

[–] [email protected] 24 points 4 months ago

Squid games reference. (or from one of the knockoff's)

[–] [email protected] 10 points 5 months ago (1 children)
view more: next ›