Scrubbing log files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's a shared apache 2 server that's set up to put daily log files in my home directory. I can't muck with config files. What I'm trying to do is to remove the entries due to spiders, robots and other requests that don't matter to me.  My perl script now looks for IP addresses used for /robots.txt requests and removes other entries with those IP addresses. But that doesn't get entries from the likes of Yahoo which informs me that the word "slurp" is the thing to look for in the browser identification entries. That works.

But I find I'm also looking for "bot", "spider", and some others which, I'm afraid, will pull out things that I would rather keep because of accidental matches. 

Are there any lists of common robots on the net?  Are there some regular expressions or searches that would help? Are there known IP addresses that are safe to discard?

-- 

--> A fair tax is one that you pay but I don't <--

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
   "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx


[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux