Re: what are the Pros and cons filtering urls using squid.conf?

Marcus Kool <marcus.kool@xxxxxxxxxxxxxxx> · Tue, 11 Jun 2013 14:53:41 -0300

On 06/11/2013 10:33 AM, Eliezer Croitoru wrote:
ufdbGuard does not pause answering queries from Squid during a reload
since that would pause Squid and is considered an interruption of service.

ufdbGuard releases the current URL database, loads a new configuration
and loads a new URL database in 10 seconds on average.
ufdbGuard has a configurable behaviour in this 10-second interval and
does either one of:
- allow all URLs; send immediately an "OK" back to Squid (default)
- allow all URLs but also introduce artifical delays when sending
replies back to Squid.
   The effect is that traffic is slowed down and the total number of
unfiltered URLs is reduced.
- deny all URLs; send immediately and "not OK" back to Squid.  The end
user receives a message like "try again in a few moments".

The last option is for the admins who need maximum control and are
afraid that users or applications can benefit from the URL filter
passing all URLs for 10 seconds.

Marcus
it a very clever IDEA.
I still prefer a real time upgradeable DB which doesn't require reload etc.
The above will require a more precise algorithms that will work with another way then only categories.
I am almost sure that squidguard actually compiles a basic algorithm when loading the config files.

squidguard and ufdbguard do the same thing:
looking at the username and IP address and hostname of the user to find out to which group the user belongs.
in squidguard/ufdbguard terminology: determine the "source".
Then for each source, there is a ACL based on URL categories, e.g.:
block adult, block proxies, allow socialnetworking, block news and allow the rest.

If there is someone that is familiar with the internals and we can think together on two ways tree(one by category and second by filtering levels) I think it will be a very nice idea.
Something like:
porn is bad and in level -51
malware is bad and in level -50
news is bad and in level -30
etc...
this way we can filter with another approach then we used to.
The only different is the static algorithm which verifies the url by domain and url.
check in the db if there is a domain+path then if there is what level is on.
check in the db if there is domain only then if there is on what level it is.
it's a very simple idea which will make lot of load on the DB but will make the algorithm very simple.

You lost me here.  What are you trying to achieve?

A youtube video can be filtered very easily by categorizing a full set of urls in youtube format with a small "addon" algorithm which knows of all the ways a youtube video can appear.
it's better then a simple REGEX and makes the search better.

Also here: what are you trying to achieve on Youtube?

If in squidguard I would categorize a video by one domain and video identifier we can use a set of domains that the next search can be wither precise or if the ID exists.
It can work with lots of sites since a URLs should reflect on specific content.
if it's more then a URL per object then a HEADER will provide enough data on the request to identify it.

Headers are not sent by a Squid to a URL redirector.  One needs an ICAP server for that.

What do you think about the above examples?

Eliezer