Re: squid consuming too much processor/cpu

Marcus Kool <marcus.kool@xxxxxxxxxxxxxxx> · Mon, 22 Mar 2010 11:48:02 -0300

Or use an alternative: ufdbGuard.

ufdbGuard is a URL filter for Squid that has a much easier
configuration file than the Squid ACLs and additional
configuration files.
ufdbGuard is also multithreaded and very fast.

And a tip: if you are really serious about blocking
anything, you should also block 'proxy sites' (i.e. sites
used to circumvent URL filters).

-Marcus

Amos Jeffries wrote:
Muhammad Sharfuddin wrote:
On Mon, 2010-03-22 at 08:47 +0100, Marcello Romani wrote:
Muhammad Sharfuddin ha scritto:
On Mon, 2010-03-22 at 19:27 +1300, Amos Jeffries wrote:
Thanks list for help.

restarting squid is not a solution, I noticed only after 20 minutes
after restarting, squid started consuming/eating CPU again.

On Wed, 2010-03-17 at 19:54 +1100, Ivan . wrote:
you might want to check out this thread
http://www.mail-archive.com/squid-users@xxxxxxxxxxxxxxx/msg56216.html 

Neither I installed any package.. i.e not checked

On Wed, 2010-03-17 at 05:27 -0700, George Herbert wrote:
or install the Google malloc library and recompile Squid to
use it instead of default gcc malloc.
On Wed, 2010-03-17 at 15:01 +0200, Henrik K wrote:
If the system regex is issue, wouldn't it be better/simpler to just
compile
with PCRE? (LDFLAGS="-lpcreposix -lpcre"). It doesn't leak and as 
a bonus
makes your REs faster.
Nor I re-compiled Squid, as I have to use binary/rpm version of squid
that shipped with the Distro I am using

issue resolved via removing acl that blocked almost 60K urls/domains

commenting following worked
##acl porn_deny url_regex "/etc/squid/domains.deny"
##http_access deny porn_deny

so how can I deny illegal contents/website ?

If those were actually domain names...
they are both urls and domain

  * use "dstdomain" type instead of regex.
ok nice suggestion

Optimize order of ACLs so do most rejections as soon as possible 
with fastest match types.
 >>
I think its optimized, as the rule(squeezing cpu) is the first rule in
squid.conf
That's the exact opposite of "optimizing" as the cpu-consuming rule 
is _always_ executed.
First rules should be non-cpu consuming (i.e. non-regexp) and should 
block most of the traffic, leaving the cpu-consuming ones at the 
bottom, ralrely executed.

If you don't mind sharing your squid.conf access lines we can work 
through optimizing with you.
I posted squid.conf when I start this thread/topic, but I have no issue
posting it again ;)
I think he meant the list of blocked sites / url
its 112K after compression, am I allowed to post/attach such a big
file ?

The mailing list will drop all attachments.

squid.conf:
acl myFTP port   20  21
acl ftp_ipes src "/etc/squid/ftp_ipes.txt"
http_access allow ftp_ipes myFTP

The most optimal form of that line is:

  acl myFTP proto FTP
  http_access allow myFTP ftp_ipes

NP: Checking the protocol is faster than checking a whole list of IPs or 
list of ports.

http_access deny myFTP

Since you only have two network IP ranges that might be possibly allowed 
after the regex checks it's a good idea to start the entire process by 
blocking the vast range of IPs which are never going to be allowed:

 acl vip src "/etc/squid/vip_ipes.txt"
 acl mynet src "/etc/squid/allowed_ipes.txt"
 http_access deny !vip !mynet

#### this is the acl eating CPU #####
acl porn_deny url_regex "/etc/squid/domains.deny"
http_access deny porn_deny
###############################

acl vip src "/etc/squid/vip_ipes.txt"
http_access allow vip

acl entweb url_regex "/etc/squid/entwebsites.txt"
http_access deny entweb

Doing the same process to entwebsites.txt that was done to domains.deny 
file will stop this one becoming a second CPU waste.

acl mynet src "/etc/squid/allowed_ipes.txt"
http_access allow mynet

This is the basic process for reducing a large list of regex down to an 
optimal set of ACL tests....

What you can do to start with is separate all the domain-only lines from 
the real regex patterns:

  grep -E "^([\^]?[htpf]://)?[a-z0-9\.]+(/?\$?)$" 
/etc/squid/domains.deny >dstdomain.deny

   grep -v -E "^([\^]?[htpf]://)?[a-z0-9\.]+(/?\$?)$" 
/etc/squid/domains.deny  >url_regex.deny

... check the output of those two files. Don't trust my 2-second pattern 
creation.

You will also need to strip any "^", "$", "http://"; and "/" bits off the 
dstdomain patterns.

When thats done see if there are any domains you can wildcard in the 
dstdomain list. Loading the result into squid.conf may produce WARNING 
lines about other duplicates that can also be removed. I'll call the ACL 
using this file "stopDomains" in the following example.

For the other file with ones where URL still needs a full pattern match, 
... split that to create another three files:
  1) dstdomains where the domain is part of the pattern. I'll call this 
"regexDomains" in the following example.
  2) the full URL regex patterns with domains in (1). I'll call this 
"regexUrls" in the example below.
  3) regex patterns where domain name does not matter to the match. 
I'll  call that "regexPaths".

When thats done, change your config to make your CPU expensive lines:

  acl porn_deny url_regex "/etc/squid/domains.deny"
  http_access deny porn_deny

change into these:

# A
  acl stopDomains dstdomain "/etc/squid/dstdomain.deny"
  http_access deny stopDomains

#B
  acl regexDomains dstdomain "/etc/squid/dstdomain.regexDomains"
  acl regexUrls  url_regex -i "/etc/squid/regex.urls"
  http_access deny regexDomains regexUrls

#C
  acl regexPaths  urlpath -i "/etc/squid/regex.paths"
  http_access deny regexPaths

As you can see regex is not done unless it really has to be done.
 At "A" the domains which don't have to use regex at all get blocked 
very fast with little CPU usage.
 At "B" the domains get checked and only the ones which might actually 
patch get a regex done to them.
 At "C" we have no choice so a regex is done as before. But (a) the list 
should now be very small and not use much CPU, and (b) most of the 
blocked domains are already blocked.

Amos