RE: Scrubbing log files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Are there any lists of common robots on the net?  Are there 
> some regular expressions or searches that would help? Are 
> there known IP addresses that are safe to discard?

I believe your question is off topic for this forum however I'll share our
joy with you.

Some are known by hostname:
http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.h
tml 

others by IP:
http://www.cuil.com/info/webmaster_info/ 

We whitelist certain bots and others, if they crawl too fast and don't obey
robots.txt, become banned.  Maintaining this is alot of ongoing task,
especially if the bot company is using plain IP addresses to identify
instead of 
http://en.wikipedia.org/wiki/Forward-confirmed_reverse_DNS which Google,
MSN, Yahoo, etc. use which is much more flexible.

Some code & thoughts to keep you busy:
http://www.webmasterworld.com/google/3092423.htm
http://www.webmasterworld.com/php/3606836.htm

Thanks,

http://www.t1shopper.com/

Attachment: smime.p7s
Description: S/MIME cryptographic signature


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux