Re: Access control : How to block a very large number of domains

Amos Jeffries <squid3@xxxxxxxxxxxxx> · Tue, 16 Jun 2009 15:12:07 +1200

On Mon, 15 Jun 2009 12:26:16 -0700 (PDT), hims92
<himanshu.singh.cse07@xxxxxxxxxxx> wrote:
> Hi,
> As far as I know, SquidGuard uses Berkeley DB (which is based on BTree
and
> Hash tables) for storing the urls and domains to be blocked. But I need
to
> store a huge amount of domains (about 7 millions) which are to be
blocked.
> Moreover, the search time to check if the domain is there in the block
> list,
> has to be less than a microsecond.
> 
> So, Will Berkeley DB serve the purpose?
> 
> I can search for a domain using PATRICIA Trie in less than 0.1
> microseconds.
> So, if Berkeley Trie is not good enough, how can I use the Patricia Trie
> instead of Berkeley DB in Squid to block the url.

Do do tests with such a critical timing you would be best to use an
internal ACL. Which eliminates networking transfer delays to external
process.

Are you fixed to a certain version of Squid?

Squid-2 is not bad to tweak, but not very easy to add to ACL either.

The Squid-3 ACL are fairly easy to implement and drop a new one in. You can
create your own version of dstdomain and have Squid do the test. At present
dstdomain uses unbalanced splay tree on full reverse-string matches which
is good but not so good as it could be for large domain lists.

If it scales well and is faster than the existing dstdomain it would be a
welcome addition.

Amos