hello, I performed the tests (to block sites using squidguard) with some less domains but squid did not respond properly, that is the network got slow. squid-2.5.STABLE11.tar squidGuard-1.2.10.tar Berkeley DB 4.2.52 number of domains in black list - 656490 (0.6 million) ; urls - 141581 (0.1 million) Peak time requests - 200/sec Amos Jeffries-2 wrote: > > On Mon, 15 Jun 2009 12:26:16 -0700 (PDT), hims92 > <himanshu.singh.cse07@xxxxxxxxxxx> wrote: >> Hi, >> As far as I know, SquidGuard uses Berkeley DB (which is based on BTree > and >> Hash tables) for storing the urls and domains to be blocked. But I need > to >> store a huge amount of domains (about 7 millions) which are to be > blocked. >> Moreover, the search time to check if the domain is there in the block >> list, >> has to be less than a microsecond. >> >> So, Will Berkeley DB serve the purpose? >> >> I can search for a domain using PATRICIA Trie in less than 0.1 >> microseconds. >> So, if Berkeley Trie is not good enough, how can I use the Patricia Trie >> instead of Berkeley DB in Squid to block the url. > > Do do tests with such a critical timing you would be best to use an > internal ACL. Which eliminates networking transfer delays to external > process. > Can you a bit more specific how to do that; am pretty new to squid. > Are you fixed to a certain version of Squid? > No am not. But presently, my institution has : squid-2.5.STABLE11.tar squidGuard-1.2.10.tar Berkeley DB 4.2.52 And would like to find the solution, if possible for these versions only. > Squid-2 is not bad to tweak, but not very easy to add to ACL either. > > The Squid-3 ACL are fairly easy to implement and drop a new one in. You > can > create your own version of dstdomain and have Squid do the test. At > present > dstdomain uses unbalanced splay tree on full reverse-string matches which > is good but not so good as it could be for large domain lists. > How to create our own version of dstdomain? Does the earlier versions(2.x) of squid also use unbalanced splay tree for searching a url/domain or do they use linear search, binary search or some other efficient search technique. Is it possible to may be store all the domains and urls (0.7 million approx) in a vector (STL) and then perform binary_search to find the result of the query? I tested the binary_search in a stand alone cpp program, and the query time was pretty satisfactory for me. How does squid handle the requests for domain ips? Does it stores all domain ips somewhere or first perform a dns lookup for the domain name and then searches for whether its in deny/access list or not before giving access? -- View this message in context: http://www.nabble.com/Access-control-%3A-How-to-block-a-very-large-number-of-domains-tp24041263p24215419.html Sent from the Squid - Users mailing list archive at Nabble.com.