Re: FreeBSD Squid timeout issue

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Thu, 05 Jul 2007 00:16:39 +1200

Dave wrote:
Hi,
   Thanks for your reply. The following is the ip and abbreviated msg:
(reason: 554 5.7.1 Service unavailable; Client host [65.24.5.137] 
blocked using dnsbl-1.uceprotect.net;
   To my squid issue, if aufs is less intensive and more efficient i'll 
definitely switch over to it. As for your suggestion about splitting in 
to multiple files I believe the version i have can do this, it has 
multiple acl statements for the safe_ports definition. My issue though 
is there's like 15000+ lines in this file, and investigating some like 
500 are duplicates. I'd rather not have to manually go through this and 
do the split, is there a way i can split based on the dst, dstdomain, or 
url_regexp you referenced? 

I just used the following commands, pulled off most of the job in a few 
minutes. The remainders that got left as regex was small. There are some 
that are duplicates of the domain-only list, but that can be dealt with 
later.

# Pull out the IPs
grep -v -E "[a-z]+" porn | sort -u >porn.ipa

# copy everything else into a temp file
grep -v -E "[a-z]+" porn | sort -u >temp.1

# pull out lines with only domain name
grep -E "^([0-9a-z\-]\.)+[a-z]+$" temp.1 | sort -u >temp.d

# pull out everthing without a domain name into another temp
grep -v -E "^([0-9a-z\-]\.)+[a-z]+$" temp.1 | sort -u >temp.2
rm temp.1

# pull out lines that are domain/ or domain<space> and drop the end
grep -E "^([0-9a-z\-]\.)+[a-z]+[\/ ]$" temp.2 | sed s/\\/// | sed s/\\ 
// | sort -u >>temp.d

# leave the rest as regex patterns
grep -v -E "^([0-9a-z\-]\.)+[a-z]+[\/ ]$" temp.2 | sort -u >porn.regex
rm temp.2

# sort the just-domains and make sure there are no duplicate.
cat temp.d | sort -u > porn.domains
rm temp.d

Amos