Chris Knight wrote:
Once I had a module that did this with MySQL... It automatically added a honeypot line to robots.txt for you (or served one up if it detected a 404 response for /robots.txt) and if the honeypot was triggered, added the remote IP to a MySQL table. It also had an access_handler installed that scanned the remote IP against those in the DB and denied access if the IP was in the "blacklist"On Tue, Dec 16, 2008 at 3:13 PM, Peter Horn <peter.horn@xxxxxxxxxxx> wrote:I don't think this is quite off-topic, just a bit left of centre. :-\ I run a small site with two subdomains of no-ip.org (like dyndns) using NameVirtualHost. Looking at the access log, a few percent of my traffic was from bots like Morfeus F***ing Scanner [my censorship], intrusion attempts (e.g. GET /login_page.php) and just plain old "wrong numbers". Nothing from what I'd think of as "good" bots (Google, etc.) Initially, I added a first (i.e. default) vhost to serve a page saying "If you don't know the URL, I'm not telling you." Then I refined this with the obvious "Deny from all".I suppose this is something you can do now. When I first started using name based virtual hosting my first vhost was a simple page that informed the reader that they had hit this page because their browser did not support HTTP/1.1 requests and had links to the latest browsers. I only got bitten by this once, when a friend using a Hughes satellite connection that utilized a HTTP/1.0 proxy to improve perceived speed couldn't get to her sites and got really really really mad at me.While this is definitely effective, do you consider it honourable/ethical/sneaky/clever/dumb/whatever? Are there any likely side-effects?My opinion is that it is your server and you can do what you want with it. I have always been bothered with the 'robot exclusion protocol' because the concept is that any commercial business can scan and copy your content by default, unless you find them and exclude them. archive.org is a personal pet peeve of mine, though I am sure I am in the minority there. With the goal of catching the bad bots, here is another idea. Create a subdirectory off your site that has a single index.php (or whatever your preferred server-side scripting language is) and have that file append the site's .htaccess file with a deny from [REMOTE_ADDR of the request]. Then put that directory in your robots.txt file. Only the really evil bots deliberately crawl the excludes in a robots.txt, and once they do you'll be blocking their requests. -Chris
I can't find it these days, though.. If people really want it, maybe I'll rewrite it from scratch as a 2.2 module...
Issac --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx