Search squid archive

Re: web filtering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Sep 30, 2006 at 04:19:13PM +0200, Christoph Haas wrote:
> On Saturday 30 September 2006 05:11, Chuck Kollars wrote:
> > Our experience with web filtering is the differences
> > in tools are _completely_ swamped by the quality and
> > depth of the blacklists. (The reverse of course is
> > also true: lack of good blacklists will doom_any_
> > filtering tool.)
> >
> > We currently have over 500,000 (!) sites listed in
> > just the porn section of our blacklist. With quality
> > lists like these, any old tool will do a decent job.
> 
> And large portions of those half million sites are probably already 
> something different but porn sites or the domains were given up. I 
> wouldn't judge the quality completely by the quantity.
> 
> > Lots of folks need to get such lists reasonably and
> > regularly (quarterly?).
> 
> Daily even.
> 
> > Useful lists are far far too 
> > large to be maintained by local staff. Probably what's
> > needed is a mechanism whereby everybody nationwide
> > contributes, some central site consolidates and
> > sanitizes, and then publishes the lists.
> 
> I'd welcome such an effort. Some companies invest a lot of effort into URL 
> categorisation - not just regarding porn sites. But they have several 
> employees working full-time on that and run a kind of editor's office.
> For a free/open-source project you would need a lot of people and some 
> mechanism (e.g. a web spider) that searches for further sites. And doing 
> that job is boring. So compared to other free/open-source projects there 
> is much less motivation to contribute constantly.
> 
> > This would be a huge effort. It's not easily possible
> > even with lots of clever scripts and plenty of compute
> > power. We've already seen more than a handful of
> > "volunteers" swallowed up by similar efforts.
> 
> I believe that the only blacklist that survived over the ages was 
> http://urlblacklist.com/ - just that they are non-free now. I may be 
> mistaken about its history though.
> 
> There already exist DNS-based blacklists that are very effective for mail 
> spam detection. Perhaps a DNS-based register where you can look up if a 
> certain domain belongs to a certain category might help. Large 
> installations like ISPs could mirror the DNS zone and private people could 
> just use them. Perhaps even the Squid developers could support such a 
> blacklist.
> 
> So IMHO we lack both a source (volunteers, spider, web based contribution 
> system) and a good way to use it. Huge static ACLs don't work well with 
> Squid.
> 
> Since I had to tell our managers at work on how well URL filtering works 
> (we use a commercial solution) I pulled some numbers. And around 3,000 
> domains are registered at the DeNIC (german domain registry) alone every 
> day. Now try that with other registries and get a rough number on many 
> domains need to get categorized every day. That's the reason why it's so 
> hard to create reasonable blacklists. (And also the cause for my rants 
> where people expect decent filtering by just using the current publicly 
> available blacklists).
> 
> You didn't tell much about your intentions though. :)
> 
> Kindly
>  Christoph

As was already stated above...sites come and go daily.  It would very 
difficult to keep any current list available.  Maybe updating on a 
daily basis and even then you would be behind.  Perhaps in addition to 
squid, you could place other filtering software to check the web page 
language.  I use  Squid + Dansguardian  for my children.  You could look
for something like it.  I only block know URLs that are less likely to 
change over night...example, playboy.  It is not a perfect solution, 
but it helps keep my URL list down.  Wish you well.

-- 
Alex
FreeBSD 6.0-RELEASE i386 GENERIC

[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux