On Sat, Sep 30, 2006 at 04:19:13PM +0200, Christoph Haas wrote: > On Saturday 30 September 2006 05:11, Chuck Kollars wrote: > > Our experience with web filtering is the differences > > in tools are _completely_ swamped by the quality and > > depth of the blacklists. (The reverse of course is > > also true: lack of good blacklists will doom_any_ > > filtering tool.) > > > > We currently have over 500,000 (!) sites listed in > > just the porn section of our blacklist. With quality > > lists like these, any old tool will do a decent job. > > And large portions of those half million sites are probably already > something different but porn sites or the domains were given up. I > wouldn't judge the quality completely by the quantity. > > > Lots of folks need to get such lists reasonably and > > regularly (quarterly?). > > Daily even. > > > Useful lists are far far too > > large to be maintained by local staff. Probably what's > > needed is a mechanism whereby everybody nationwide > > contributes, some central site consolidates and > > sanitizes, and then publishes the lists. > > I'd welcome such an effort. Some companies invest a lot of effort into URL > categorisation - not just regarding porn sites. But they have several > employees working full-time on that and run a kind of editor's office. > For a free/open-source project you would need a lot of people and some > mechanism (e.g. a web spider) that searches for further sites. And doing > that job is boring. So compared to other free/open-source projects there > is much less motivation to contribute constantly. > > > This would be a huge effort. It's not easily possible > > even with lots of clever scripts and plenty of compute > > power. We've already seen more than a handful of > > "volunteers" swallowed up by similar efforts. > > I believe that the only blacklist that survived over the ages was > http://urlblacklist.com/ - just that they are non-free now. I may be > mistaken about its history though. > > There already exist DNS-based blacklists that are very effective for mail > spam detection. Perhaps a DNS-based register where you can look up if a > certain domain belongs to a certain category might help. Large > installations like ISPs could mirror the DNS zone and private people could > just use them. Perhaps even the Squid developers could support such a > blacklist. > > So IMHO we lack both a source (volunteers, spider, web based contribution > system) and a good way to use it. Huge static ACLs don't work well with > Squid. > > Since I had to tell our managers at work on how well URL filtering works > (we use a commercial solution) I pulled some numbers. And around 3,000 > domains are registered at the DeNIC (german domain registry) alone every > day. Now try that with other registries and get a rough number on many > domains need to get categorized every day. That's the reason why it's so > hard to create reasonable blacklists. (And also the cause for my rants > where people expect decent filtering by just using the current publicly > available blacklists). > > You didn't tell much about your intentions though. :) > > Kindly > Christoph As was already stated above...sites come and go daily. It would very difficult to keep any current list available. Maybe updating on a daily basis and even then you would be behind. Perhaps in addition to squid, you could place other filtering software to check the web page language. I use Squid + Dansguardian for my children. You could look for something like it. I only block know URLs that are less likely to change over night...example, playboy. It is not a perfect solution, but it helps keep my URL list down. Wish you well. -- Alex FreeBSD 6.0-RELEASE i386 GENERIC