Search squid archive

Re: Complicate ACL affect performance?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Henrik K wrote:
On Sat, Oct 18, 2008 at 11:54:52PM +1300, Amos Jeffries wrote:
Henrik K wrote:
On Sat, Oct 18, 2008 at 12:44:46PM +0300, Henrik K wrote:
Not sure what the splay code does in Squid, didn't have time to grab it.

Produces a very inefficient unsorted but alphabetically ordered trinary tree.

But a simple test with Perl:

- Grepped some hostnames from wwwlogs etc
- Regexp::Assemble'd 50000 unique hostnames (= 560kB regex, took 22 sec)
- Run 100000 hostnames on it in 4 seconds (25000 hosts/sec on 2.8Ghz CPU)

It's pretty powerful stuff.
Oops, did it even slightly wrong.

By doing it correctly, using ^hostname$ instead of plain hostname in regex
results in 1.2 seconds, that's 80000+ hosts/sec..

Sill out slightly. The fair test for that vs squid splay tree would be still missing the ^ to match any given *.example.com$

Fair test would be reversing the hostname, which is very cheap operation. ;)

No. Because most users will not write their ACL regex normally, and the regex has to match a forward-coded domain anyway. The squid algorithm works on forward-coded domains.

A fair test, therefore uses each methods native comparison style from forward-coded domains as input. dstdomain does not even really use the terminator equivalent to $ in its matches, though it is assumed.

Your initial claim was that simply assembling the regex was faster than dstdomain comparison. You've provided the regex numbers. I'm working on the sourcelayout project, which should simplify the code so we can build a benchmark test app for dstdomain easily sometime soon.

Just a guesstimate (not knowing the avg domain length you used, my numbers assume max-length 256byte domain names). I expect it matches at over 200k domains per second on a single-CPU 2.8GHz machine.


(^|\.)example\.com$  .. runtime 2.2 secs
^moc\.elpmaxe(\.|$)  .. runtime 1.3 secs

No one is suggesting that dstdomain should be replaced by regexs though.
This just proves that if you need them, they can be used efficiently.

You implied it very strongly with your statement that we should stop recommending dstdomain for domain-only ACL. The informed developers have never said NO regex. Only pointed out uses where its not worth using. One of the major optimization I myself promote is adding a src ACL on each access line to restrict the times regex or other 'slow' acl get tested to start with.

Amos
--
Please use Squid 2.7.STABLE4 or 3.0.STABLE9

[Index of Archives]     [Linux Audio Users]     [Samba]     [Big List of Linux Books]     [Linux USB]     [Yosemite News]

  Powered by Linux