Re: what are the Pros and cons filtering urls using squid.conf?

Eliezer Croitoru <eliezer@xxxxxxxxxxxx> · Tue, 11 Jun 2013 16:33:30 +0300

On 6/11/2013 3:36 PM, Marcus Kool wrote:

On 06/11/2013 09:09 AM, Jose-Marcio Martins wrote:
On 06/11/2013 12:50 PM, Marcus Kool wrote:

There is a big misunderstanding:
in the old days when the only URL filter was squidguard, Squid had
the be reloaded in order for
squidguard to reloads its database.
And when Squid reloads, *everything* pauses.
_But things have changed since then_:
- ICAP-based URL filters can reload a URL database without Squid
reloading
- ufdbGuard, which is a URL redirector just like squidGuard, can also
reload a URL database without
Squid reloading.

The above implies that ICAP-based filters and ufdbGuard are a good
alternatives for squidguard or
filtering by ACLs.

...

ufdbGuard loads the URL database in memory and is multithreaded.

OK. Ok if can handle 50000 queries per second.

So my question is... a more direct and precise question, just about
ufdbGuard. While ufdbGuard reloads its URL database, does it pauses
answering queries ? If yes, how long does it takes ?

ufdbGuard does not pause answering queries from Squid during a reload
since that would pause Squid and is considered an interruption of service.

ufdbGuard releases the current URL database, loads a new configuration
and loads a new URL database in 10 seconds on average.
ufdbGuard has a configurable behaviour in this 10-second interval and
does either one of:
- allow all URLs; send immediately an "OK" back to Squid (default)
- allow all URLs but also introduce artifical delays when sending
replies back to Squid.
   The effect is that traffic is slowed down and the total number of
unfiltered URLs is reduced.
- deny all URLs; send immediately and "not OK" back to Squid.  The end
user receives a message like "try again in a few moments".

The last option is for the admins who need maximum control and are
afraid that users or applications can benefit from the URL filter
passing all URLs for 10 seconds.

Marcus
it a very clever IDEA.
I still prefer a real time upgradeable DB which doesn't require reload etc.
The above will require a more precise algorithms that will work with 
another way then only categories.
I am almost sure that squidguard actually compiles a basic algorithm 
when loading the config files.
If there is someone that is familiar with the internals and we can think 
together on two ways tree(one by category and second by filtering 
levels) I think it will be a very nice idea.
Something like:
porn is bad and in level -51
malware is bad and in level -50
news is bad and in level -30
etc...
this way we can filter with another approach then we used to.
The only different is the static algorithm which verifies the url by 
domain and url.
check in the db if there is a domain+path then if there is what level is on.
check in the db if there is domain only then if there is on what level 
it is.
it's a very simple idea which will make lot of load on the DB but will 
make the algorithm very simple.
A youtube video can be filtered very easily by categorizing a full set 
of urls in youtube format with a small "addon" algorithm which knows of 
all the ways a youtube video can appear.
it's better then a simple REGEX and makes the search better.
If in squidguard I would categorize a video by one domain and video 
identifier we can use a set of domains that the next search can be 
wither precise or if the ID exists.
It can work with lots of sites since a URLs should reflect on specific 
content.
if it's more then a URL per object then a HEADER will provide enough 
data on the request to identify it.

What do you think about the above examples?

Eliezer