RE: 404's to robots.txt?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Evan Platt [mailto:evan@xxxxxxxxxxxxxxxxxx] 
> Sent: Wednesday, July 22, 2009 1:56 AM
> To: users@xxxxxxxxxxxxxxxx
> Subject:  404's to robots.txt?
> 
> So I've noticed quite a lot of connections from web spider programs. 
> I've had a robots.txt
> (User-agent: *
> Disallow: /)  For a long time. But looking closer in my apache logs, 
> am I reading right that it's giving a 404?

Yes.

How many VHs do you have? If you have robots.txt in one VH but the
request comes into another VH, then you will get a 404. Maybe put
%{Host}i into the log format to see the Host header sent by the client..

Rgds,
Owen Boyle
Disclaimer: Any disclaimer attached to this message may be ignored. 

> 
> 65.55.106.173 - - [21/Jul/2009:09:44:43 -0700] "GET /robots.txt 
> HTTP/1.1" 404 208 "-" "msnbot/2.0b 
> (+http://search.msn.com/msnbot.htm)"
> 65.55.106.112 - - [21/Jul/2009:10:11:43 -0700] "GET /robots.txt 
> HTTP/1.1" 404 208 "-" "msnbot/2.0b 
> (+http://search.msn.com/msnbot.htm)"
> 65.55.106.166 - - [21/Jul/2009:11:03:35 -0700] "GET /robots.txt 
> HTTP/1.1" 404 208 "-" "msnbot/2.0b 
> (+http://search.msn.com/msnbot.htm)"
> 65.55.106.160 - - [21/Jul/2009:11:09:07 -0700] "GET /robots.txt 
> HTTP/1.1" 200 28 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
> 65.55.106.180 - - [21/Jul/2009:11:35:34 -0700] "GET /robots.txt 
> HTTP/1.1" 404 208 "-" "msnbot/2.0b 
> (+http://search.msn.com/msnbot.htm)"
> 
> Same day, no changes made:
> X.X.X.X - - [21/Jul/2009:16:47:44 -0700] "GET /robots.txt HTTP/1.1" 
> 304 - "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; 
> rv:1.9.1.1) Gecko/20090715 Firefox/3.0.7, Ant.com Toolbar 1.3 (.NET 
> CLR 3.5.30729)"
> Z.Z.Z.Z- - [21/Jul/2009:16:49:10 -0700] "GET /robots.txt HTTP/1.1" 
> 200 28 "-" "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) 
> AppleWebKit/530.5 (KHTML, like Gecko) Chrome/2.0.172.30 Safari/530.5"
> 
> Two different IP's. One myne, one a friends.
> 
> Any suggestions as to why (if I'm reading the log right) I'm handing 
> out a 404 to it appears just web crawlers?
> 
> # httpd -v
> Server version: Apache/2.2.3
> Server built:   Jun 16 2009 11:28:50
> 
> Don't know what other information is needed to help troubleshoot... 
> Running on a os//x box.
> http://www.espphotography.com/robots.txt if you want to take a look...
> 
> Thanks. :)
> 
> Evan
> 
> 
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP 
> Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
>    "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
> For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx
> 
> 
 
This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. If you receive this message in error, please notify the sender urgently and then immediately delete the message and any copies of it from your system. Please also immediately destroy any hardcopies of the message. 
The sender's company reserves the right to monitor all e-mail communications through their networks.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx



[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux