Henrik K wrote:
On Sat, Oct 18, 2008 at 11:54:52PM +1300, Amos Jeffries wrote:
Henrik K wrote:
On Sat, Oct 18, 2008 at 12:44:46PM +0300, Henrik K wrote:
Not sure what the splay code does in Squid, didn't have time to grab it.
Produces a very inefficient unsorted but alphabetically ordered trinary
tree.
But a simple test with Perl:
- Grepped some hostnames from wwwlogs etc
- Regexp::Assemble'd 50000 unique hostnames (= 560kB regex, took 22 sec)
- Run 100000 hostnames on it in 4 seconds (25000 hosts/sec on 2.8Ghz CPU)
It's pretty powerful stuff.
Oops, did it even slightly wrong.
By doing it correctly, using ^hostname$ instead of plain hostname in regex
results in 1.2 seconds, that's 80000+ hosts/sec..
Sill out slightly. The fair test for that vs squid splay tree would be
still missing the ^ to match any given *.example.com$
Fair test would be reversing the hostname, which is very cheap operation. ;)
No. Because most users will not write their ACL regex normally, and the
regex has to match a forward-coded domain anyway. The squid algorithm
works on forward-coded domains.
A fair test, therefore uses each methods native comparison style from
forward-coded domains as input. dstdomain does not even really use the
terminator equivalent to $ in its matches, though it is assumed.
Your initial claim was that simply assembling the regex was faster than
dstdomain comparison.
You've provided the regex numbers. I'm working on the sourcelayout
project, which should simplify the code so we can build a benchmark test
app for dstdomain easily sometime soon.
Just a guesstimate (not knowing the avg domain length you used, my
numbers assume max-length 256byte domain names). I expect it matches at
over 200k domains per second on a single-CPU 2.8GHz machine.
(^|\.)example\.com$ .. runtime 2.2 secs
^moc\.elpmaxe(\.|$) .. runtime 1.3 secs
No one is suggesting that dstdomain should be replaced by regexs though.
This just proves that if you need them, they can be used efficiently.
You implied it very strongly with your statement that we should stop
recommending dstdomain for domain-only ACL. The informed developers have
never said NO regex. Only pointed out uses where its not worth using.
One of the major optimization I myself promote is adding a src ACL on
each access line to restrict the times regex or other 'slow' acl get
tested to start with.
Amos
--
Please use Squid 2.7.STABLE4 or 3.0.STABLE9