Search squid archive

Re: Re: I need help with url_regex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/09/10 09:17, devlin7 wrote:

Thanks Amos for the feedback.

It must be that I am entering it incorrectly because anything with a * or ?
doesn't work at all.

Are you sure that the "." is treated as "any character"

I am. In posix regex...
 "." means any (single) character.
 "*" means any zero or more of the previous item.
 "*" means any one or more of the previous item.
 "?" means zero or one of the previous item.
 "\" means treat the next character as exact, even if its usually special.

by "item" above I mean one character or a whole bracketed () thing.

To be matched as part of the source text the reserved characters all need to be escaped like \? in the pattern.


I would have thought that blocking .info would block any site that had .info
in it like www.porn.info but from what you are saying it would also block
www.sinfo.com. Am I correct?

Yes. These also-rans are most of the problem for this type of config.


So is there a beetter way?

Yes, a few. Breaking the denial into several rules will help do it faster and more precisely.


In most cases you will find you can do away with the regex part entirely and ban a whole domain. This way you can also search online and download lists of proxy domains to block wholesale. It's far easier than trying to build the list yourself. SquidGuard, DansGuardian, ufdb tools provide some lists like this. Also RHSBL anti-spam lists often include open proxy domains.


Some matches you can limit to only trying the matching on certain domains and doing the regex on only the path portion of the URL (urlpath_regex matches path+query string):

  acl badDomains dstdomain .example.com .info
  acl urlPathRegex urlpath_regex ^/browse\.php \.php\?q= \.php\?u=i8v
  http_access deny badDomains urlPathRegex


There will be some patterns which detect certain types of broken CMS (usually the search component "\?q=" like I mentioned) which act like a proxy even if they were not intended that way. Doing a urlpath_regex without the domain protection above is needed to catch many site using these CMS. Just be sure of and careful with the patterns.


NP: Ordering your rules in the same order I've named them above will even provide some measure of speed gain to the proxy. dstdomain is rather fast matching, regex is slow and resource hungry.


To backup everything you need reliable management support behind the blocking policy. With stronger enforcement for students caught actively trying to evade it. Without those you are is the sad position of an endless race.

Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.8
  Beta testers wanted for 3.2.0.2


[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux