This don't show all IP. Only the IP from the last 5000 request. If you want all IP from all request in the access file you must use: grep -w "403" access.log | awk '{print $1}' | sort | uniq > file.txt On Fri, Mar 20, 2009 at 11:48 AM, Frog <frog@xxxxxxxx> wrote: > > Hello all, > > Thank you Chris for the suggestion. It helped enormously. I have extracted the data I was looking for by using the following: > > tail -n 5000 access.log | grep "403" | awk '{print $1}' | uniq -d > file.txt > > Best regards > Frog. > > > ----- Original Message ----- > From: "Chris Robertson" <crobertson@xxxxxx> > To: squid-users@xxxxxxxxxxxxxxx > Sent: Thursday, 19 March, 2009 21:37:25 GMT +00:00 GMT Britain, Ireland, Portugal > Subject: Re: Extracting selected data from logfile > > Frog wrote: > > Hello All, > > > > Hopefully someone may be able to assist me. > > > > I have Squid setup here as a reverse proxy. I have logging configured using the following settings in squid.conf: > > > > logformat combined %>a %ui %un [%tl] "%rm %ru HTTP/%rv" %Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh > > access_log /var/log/squid/access.log combined > > > > To block certain bots and bad user agents I have the following: > > > > acl badbrowsers browser "/etc/squid/badbrowsers.conf" > > http_access deny badbrowsers > > > > The http_access deny returns a 403 to a visitor that meets the criteria in badbrowsers.conf and this works perfectly. But I would like to take this one step further. I would like to build a blacklist in real time if possible of IP addresses that have been served a 403 error. > > > > Unfortunately my knowledge of most of the popular scripting languages is non-existent so I was wondering if something like a redirector could be configured to meet my needs? > > > > I have looked at fail2ban however it doesn't seem to parse my log files even if I change the squid log format to common. > > > > Basically I am wondering if there is a way to parse the logfile to append to a new file any IP address that was served a 403. > > > > Something like... > > tail -n 5000 /path/to/access.log |grep "HTTP/[^"]*\" 403" |awk '{print $1} > > ...run from the command line should (on my GNU/Linux machine) search the > last 5000 lines (tail -n 5000) of the file at /path/to/access.log for > the string "HTTP/" followed by any number of characters that are NOT a > double quote, followed by a double quote, a space, and the string "403" > (grep "HTTP...). The first column from any lines with a matching > pattern will be printed (awk '{print$1}). > > This is in no way tested, and obviously does not append to a file or run > automatically. > > > Thank you in advance for any pointers. > > > > Frog.. > > > > Chris >