poor m4 hash performance

ericblake@xxxxxxxxxxx (Eric Blake) · Sun, 04 Jun 2006 03:40:05 +0000

I was investigating the hash performance of m4 1.4.4, based on an
idea in a patch submitted back in January to use pointer comparison
instead of string comparison when possible.  To get an idea of how bad
things get, I hacked m4 to output profile info on exit, and found that when
autom4te.cache does not exist, running aclocal during the bootstrap
of m4 CVS branch-1_4 (which is a relatively simple configure.ac), spent
lots of time doing string comparisons (the lookup modes are lookup, insert,
delete, pushdef, popdef):

m4: lookup mode 0 called 229730 times, 557880 compares, 348729 misses, 4672624 bytes
m4: lookup mode 1 called 3831 times, 9733 compares, 6202 misses, 149343 bytes
m4: lookup mode 2 called 54 times, 213 compares, 159 misses, 2904 bytes
m4: lookup mode 3 called 4256 times, 7130 compares, 5735 misses, 107876 bytes
m4: lookup mode 4 called 2888 times, 6394 compares, 3506 misses, 101411 bytes
m4: lookup mode 0 called 1422 times, 2243 compares, 829 misses, 23082 bytes
m4: lookup mode 1 called 233 times, 41 compares, 36 misses, 467 bytes
m4: lookup mode 2 called 49 times, 50 compares, 6 misses, 328 bytes
m4: lookup mode 3 called 0 times, 0 compares, 0 misses, 0 bytes
m4: lookup mode 4 called 0 times, 0 compares, 0 misses, 0 bytes
m4: lookup mode 0 called 1372 times, 2115 compares, 748 misses, 22418 bytes
m4: lookup mode 1 called 264 times, 79 compares, 43 misses, 1205 bytes
m4: lookup mode 2 called 49 times, 50 compares, 6 misses, 328 bytes
m4: lookup mode 3 called 0 times, 0 compares, 0 misses, 0 bytes
m4: lookup mode 4 called 0 times, 0 compares, 0 misses, 0 bytes

Note that calling aclocal invoked m4 3 times (via autom4te), and that the
later 2 times greatly benefitted from the cache created the first time.  Then,
not shown, are another 3 runs each for autoconf, autoheader, and
automake, each also showing a huge first-run slowdown thanks to lots
of comparisons.  The combination of modes 1 and 3 shows that we
do more than 8000 macro definitions during the course of creating
configure.

Even more telling is autoreconf on CVS coreutils, which has a more
complex configure.ac (in part due to gnulib) - one line of profile shows
that m4 had to plow through 84megs of data to do 2.5 million lookups;
not shown is that there were more than 80000 macro definitions (an
order of magnitude more than in m4).

m4: lookup mode 0 called 2526799 times, 9834148 compares, 7470505 misses, 84329947 bytes

However, it seems like there are two things we can improve.  First, should
autom4te experiment with changing the default hash size, using the
-H option?  By default, m4 1.4.4 uses a 509 bucket hash table, with no
dynamic growth.  Without a larger table, configure scripts are so complex
that you are generating loads of collisions and extra time spent comparing
strings.  But what size would be the best trade of memory for speed, and
how do we judge how complex the configure script is?

Second, should m4 1.4.5 do some simple things to improve execution
speed?  Currently, m4 1.4.4 stores collided symbols in a bucket in
sorted order, and does a strcmp on EVERY symbol up to the point where
the correct symbol is found (or is found not to be a macro).  My idea is to
trade memory for speed - by storing the hash of each string in the table,
as well as the string, we can do a simple integer comparison to greatly
narrow down the set of symbols where a strcmp is actually necessary.
Is a patch along this front for m4 considered acceptable this close to
the release of m4 1.4.5?  CVS head m4 (will become 2.0) will dynamically
grow its hash tables as they reach a fullness threshold, but I'm not willing
to backport that much complexity back to the 1.4.x branch.

-- 
Eric Blake

_______________________________________________
Autoconf mailing list
Autoconf@xxxxxxx
http://lists.gnu.org/mailman/listinfo/autoconf