Search squid archive

Re: web filtering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Saturday 30 September 2006 05:11, Chuck Kollars wrote:
> Our experience with web filtering is the differences
> in tools are _completely_ swamped by the quality and
> depth of the blacklists. (The reverse of course is
> also true: lack of good blacklists will doom_any_
> filtering tool.)
>
> We currently have over 500,000 (!) sites listed in
> just the porn section of our blacklist. With quality
> lists like these, any old tool will do a decent job.

And large portions of those half million sites are probably already 
something different but porn sites or the domains were given up. I 
wouldn't judge the quality completely by the quantity.

> Lots of folks need to get such lists reasonably and
> regularly (quarterly?).

Daily even.

> Useful lists are far far too 
> large to be maintained by local staff. Probably what's
> needed is a mechanism whereby everybody nationwide
> contributes, some central site consolidates and
> sanitizes, and then publishes the lists.

I'd welcome such an effort. Some companies invest a lot of effort into URL 
categorisation - not just regarding porn sites. But they have several 
employees working full-time on that and run a kind of editor's office.
For a free/open-source project you would need a lot of people and some 
mechanism (e.g. a web spider) that searches for further sites. And doing 
that job is boring. So compared to other free/open-source projects there 
is much less motivation to contribute constantly.

> This would be a huge effort. It's not easily possible
> even with lots of clever scripts and plenty of compute
> power. We've already seen more than a handful of
> "volunteers" swallowed up by similar efforts.

I believe that the only blacklist that survived over the ages was 
http://urlblacklist.com/ - just that they are non-free now. I may be 
mistaken about its history though.

There already exist DNS-based blacklists that are very effective for mail 
spam detection. Perhaps a DNS-based register where you can look up if a 
certain domain belongs to a certain category might help. Large 
installations like ISPs could mirror the DNS zone and private people could 
just use them. Perhaps even the Squid developers could support such a 
blacklist.

So IMHO we lack both a source (volunteers, spider, web based contribution 
system) and a good way to use it. Huge static ACLs don't work well with 
Squid.

Since I had to tell our managers at work on how well URL filtering works 
(we use a commercial solution) I pulled some numbers. And around 3,000 
domains are registered at the DeNIC (german domain registry) alone every 
day. Now try that with other registries and get a rough number on many 
domains need to get categorized every day. That's the reason why it's so 
hard to create reasonable blacklists. (And also the cause for my rants 
where people expect decent filtering by just using the current publicly 
available blacklists).

You didn't tell much about your intentions though. :)

Kindly
 Christoph

[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux