On Tue, Dec 16, 2008 at 3:13 PM, Peter Horn <peter.horn@xxxxxxxxxxx> wrote: > I don't think this is quite off-topic, just a bit left of centre. :-\ > I run a small site with two subdomains of no-ip.org (like dyndns) using > NameVirtualHost. Looking at the access log, a few percent of my traffic was > from bots like Morfeus F***ing Scanner [my censorship], intrusion attempts > (e.g. GET /login_page.php) and just plain old "wrong numbers". Nothing from > what I'd think of as "good" bots (Google, etc.) Initially, I added a first > (i.e. default) vhost to serve a page saying "If you don't know the URL, I'm > not telling you." Then I refined this with the obvious "Deny from all". I suppose this is something you can do now. When I first started using name based virtual hosting my first vhost was a simple page that informed the reader that they had hit this page because their browser did not support HTTP/1.1 requests and had links to the latest browsers. I only got bitten by this once, when a friend using a Hughes satellite connection that utilized a HTTP/1.0 proxy to improve perceived speed couldn't get to her sites and got really really really mad at me. > While this is definitely effective, do you consider it > honourable/ethical/sneaky/clever/dumb/whatever? Are there any likely > side-effects? My opinion is that it is your server and you can do what you want with it. I have always been bothered with the 'robot exclusion protocol' because the concept is that any commercial business can scan and copy your content by default, unless you find them and exclude them. archive.org is a personal pet peeve of mine, though I am sure I am in the minority there. With the goal of catching the bad bots, here is another idea. Create a subdirectory off your site that has a single index.php (or whatever your preferred server-side scripting language is) and have that file append the site's .htaccess file with a deny from [REMOTE_ADDR of the request]. Then put that directory in your robots.txt file. Only the really evil bots deliberately crawl the excludes in a robots.txt, and once they do you'll be blocking their requests. -Chris --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx