I was investigating the hash performance of m4 1.4.4, based on an idea in a patch submitted back in January to use pointer comparison instead of string comparison when possible. To get an idea of how bad things get, I hacked m4 to output profile info on exit, and found that when autom4te.cache does not exist, running aclocal during the bootstrap of m4 CVS branch-1_4 (which is a relatively simple configure.ac), spent lots of time doing string comparisons (the lookup modes are lookup, insert, delete, pushdef, popdef): m4: lookup mode 0 called 229730 times, 557880 compares, 348729 misses, 4672624 bytes m4: lookup mode 1 called 3831 times, 9733 compares, 6202 misses, 149343 bytes m4: lookup mode 2 called 54 times, 213 compares, 159 misses, 2904 bytes m4: lookup mode 3 called 4256 times, 7130 compares, 5735 misses, 107876 bytes m4: lookup mode 4 called 2888 times, 6394 compares, 3506 misses, 101411 bytes m4: lookup mode 0 called 1422 times, 2243 compares, 829 misses, 23082 bytes m4: lookup mode 1 called 233 times, 41 compares, 36 misses, 467 bytes m4: lookup mode 2 called 49 times, 50 compares, 6 misses, 328 bytes m4: lookup mode 3 called 0 times, 0 compares, 0 misses, 0 bytes m4: lookup mode 4 called 0 times, 0 compares, 0 misses, 0 bytes m4: lookup mode 0 called 1372 times, 2115 compares, 748 misses, 22418 bytes m4: lookup mode 1 called 264 times, 79 compares, 43 misses, 1205 bytes m4: lookup mode 2 called 49 times, 50 compares, 6 misses, 328 bytes m4: lookup mode 3 called 0 times, 0 compares, 0 misses, 0 bytes m4: lookup mode 4 called 0 times, 0 compares, 0 misses, 0 bytes Note that calling aclocal invoked m4 3 times (via autom4te), and that the later 2 times greatly benefitted from the cache created the first time. Then, not shown, are another 3 runs each for autoconf, autoheader, and automake, each also showing a huge first-run slowdown thanks to lots of comparisons. The combination of modes 1 and 3 shows that we do more than 8000 macro definitions during the course of creating configure. Even more telling is autoreconf on CVS coreutils, which has a more complex configure.ac (in part due to gnulib) - one line of profile shows that m4 had to plow through 84megs of data to do 2.5 million lookups; not shown is that there were more than 80000 macro definitions (an order of magnitude more than in m4). m4: lookup mode 0 called 2526799 times, 9834148 compares, 7470505 misses, 84329947 bytes However, it seems like there are two things we can improve. First, should autom4te experiment with changing the default hash size, using the -H option? By default, m4 1.4.4 uses a 509 bucket hash table, with no dynamic growth. Without a larger table, configure scripts are so complex that you are generating loads of collisions and extra time spent comparing strings. But what size would be the best trade of memory for speed, and how do we judge how complex the configure script is? Second, should m4 1.4.5 do some simple things to improve execution speed? Currently, m4 1.4.4 stores collided symbols in a bucket in sorted order, and does a strcmp on EVERY symbol up to the point where the correct symbol is found (or is found not to be a macro). My idea is to trade memory for speed - by storing the hash of each string in the table, as well as the string, we can do a simple integer comparison to greatly narrow down the set of symbols where a strcmp is actually necessary. Is a patch along this front for m4 considered acceptable this close to the release of m4 1.4.5? CVS head m4 (will become 2.0) will dynamically grow its hash tables as they reach a fullness threshold, but I'm not willing to backport that much complexity back to the 1.4.x branch. -- Eric Blake _______________________________________________ Autoconf mailing list Autoconf@xxxxxxx http://lists.gnu.org/mailman/listinfo/autoconf