this post was submitted on 14 Feb 2024
1079 points (100.0% liked)

Technology

69545 readers
3680 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 235 points 1 year ago (27 children)

Put something in robots.txt that isn't supposed to be hit and is hard to hit by non-robots. Log and ban all IPs that hit it.

Imperfect, but can't think of a better solution.

[–] [email protected] 12 points 1 year ago (9 children)

robots.txt is purely textual; you can't run JavaScript or log anything. Plus, one who doesn't intend to follow robots.txt wouldn't query it.

[–] [email protected] 15 points 1 year ago (1 children)

You're second point is a good one, but you absolutely can log the IP which requested robots.txt. That's just a standard part of any http server ever, no JavaScript needed.

[–] [email protected] 10 points 1 year ago

You'd probably have to go out of your way to avoid logging this. I've always seen such logs enabled by default when setting up web servers.

load more comments (7 replies)
load more comments (24 replies)