A colleague and I have developed a fancy compiler for a new sort of advanced numeric programming language. The output of this compiler is C source code. Although optimized in some respects, this C is somewhat bizarre in others. In particular, it defines gobs of new structure types and gobs of very very short functions, and there are no pointers used. It should be possible, using the optimization techniques already present in GCC, for very tense machine code to be generated from this admittedly strange FORTRAN-style C source code. But instead, the assembly code GCC generates is full of unnecessary data shuffling. So much data shuffling that this dominates the actual useful arithmetic instructions, by a factor of 100s, causing a slowdown in the generated executable of a similar magnitude. The poor optimization is present no matter what we try: all versions of GCC and all optimization flags. Although it does seem to be a little better in GCC 4.2. What I'm hoping for is one of the following: - Some new GCC option magic that would get this all optimized. - Some small change we could make to the generated C sources that would cause it to be optimized well. (Add some magic __attribute__ somewhere.) - Some other magic (rebuild GCC with build option XXX, or patch the GCC sources *here* and *here*) that would make it optimize well. - Some combination of the above. - A pointer to some other compiler (horrors!) that would optimize this well. The C sources, and generated assembly, are too long to attach below. Instead, I am making them available at http://www.bcl.hamilton.ie/~barak/stalingrad-vs-gcc/ Below are notes that include detailed version information on the compilers used. In the notes below we used -O2 -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 but the results don't seem to improve by changing them. Our thanks, to anyone who takes up the challenge, for looking at and thinking about this issue. -- Barak A. Pearlmutter <barak@xxxxxxxxxx> Hamilton Institute & Dept Comp Sci, NUI Maynooth, Co. Kildare, Ireland http://www.bcl.hamilton.ie/~barak/ ---------------------------------------------------------------- --- NOTES --- ---------------------------------------------------------------- $ gcc-4.1 -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --with-tune=i686 --enable-checking=release i486-linux-gnu Thread model: posix gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) $ gcc-4.1 -S -O2 -Wall -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 particle1.c particle1.c:10763: warning: 'f95' defined but not used particle1.c:10775: warning: 'f110' defined but not used particle1.c:10788: warning: 'f126' defined but not used particle1.c:10887: warning: 'f273' defined but not used particle1.c:10888: warning: 'f274' defined but not used particle1.c:10889: warning: 'f275' defined but not used particle1.c:10890: warning: 'f277' defined but not used particle1.c:12456: warning: 'f2456' defined but not used particle1.c:12478: warning: 'f2482' defined but not used particle1.c:12583: warning: 'f2623' defined but not used particle1.c:12631: warning: 'f2690' defined but not used particle1.c:12678: warning: 'f2752' defined but not used particle1.c:12720: warning: 'f2828' defined but not used $ mv particle1.s particle1-gcc41.s $ gcc-4.2 -v Using built-in specs. Target: i486-linux-gnu Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2 --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-targets=all --disable-werror --enable-checking=release --build=i486-linux-gnu --host=i486-linux-gnu --target=i486-linux-gnu Thread model: posix gcc version 4.2.1 20070525 (prerelease) (Debian 4.2-20070525-1) $ gcc-4.2 -S -O2 -Wall -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 particle1.c particle1.c:10763: warning: 'f95' defined but not used particle1.c:10775: warning: 'f110' defined but not used particle1.c:10788: warning: 'f126' defined but not used particle1.c:10887: warning: 'f273' defined but not used particle1.c:10888: warning: 'f274' defined but not used particle1.c:10889: warning: 'f275' defined but not used particle1.c:10890: warning: 'f277' defined but not used particle1.c:12456: warning: 'f2456' defined but not used particle1.c:12478: warning: 'f2482' defined but not used particle1.c:12583: warning: 'f2623' defined but not used particle1.c:12631: warning: 'f2690' defined but not used particle1.c:12678: warning: 'f2752' defined but not used particle1.c:12720: warning: 'f2828' defined but not used $ mv particle1.s particle1-gcc42.s $ gcc-2.95 -v Reading specs from /usr/lib/gcc-lib/i486-linux-gnu/2.95.4/specs gcc version 2.95.4 20011002 (Debian prerelease) $ gcc-2.95 -S -O2 -Wall -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 particle1.c cc1: Invalid option `fpmath=sse' cc1: Invalid option `sse3' particle1.c: In function `write_real': particle1.c:7: warning: use of `l' length character with `g' type character particle1.c: At top level: particle1.c:10763: warning: `f95' defined but not used particle1.c:10775: warning: `f110' defined but not used particle1.c:10788: warning: `f126' defined but not used particle1.c:10887: warning: `f273' defined but not used particle1.c:10888: warning: `f274' defined but not used particle1.c:10889: warning: `f275' defined but not used particle1.c:10890: warning: `f277' defined but not used particle1.c:12456: warning: `f2456' defined but not used particle1.c:12478: warning: `f2482' defined but not used particle1.c:12583: warning: `f2623' defined but not used particle1.c:12631: warning: `f2690' defined but not used particle1.c:12678: warning: `f2752' defined but not used particle1.c:12720: warning: `f2828' defined but not used $ mv particle1.s particle1-gcc295.s $ gcc-3.3 -v Reading specs from /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/specs Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --enable-__cxa_atexit --with-system-zlib --enable-nls --without-included-gettext --enable-clocale=gnu --enable-debug i486-linux-gnu Thread model: posix gcc version 3.3.6 (Debian 1:3.3.6-15) $ gcc-3.3 -S -O2 -Wall -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 particle1.c particle1.c:10763: warning: `f95' defined but not used particle1.c:10775: warning: `f110' defined but not used particle1.c:10788: warning: `f126' defined but not used particle1.c:10887: warning: `f273' defined but not used particle1.c:10888: warning: `f274' defined but not used particle1.c:10889: warning: `f275' defined but not used particle1.c:10890: warning: `f277' defined but not used particle1.c:12456: warning: `f2456' defined but not used particle1.c:12478: warning: `f2482' defined but not used particle1.c:12583: warning: `f2623' defined but not used particle1.c:12631: warning: `f2690' defined but not used particle1.c:12678: warning: `f2752' defined but not used particle1.c:12720: warning: `f2828' defined but not used $ mv particle1.s particle1-gcc33.s $ gcc-3.4 -v Reading specs from /usr/lib/gcc/i486-linux-gnu/3.4.6/specs Configured with: ../src/configure -v --enable-languages=c,c++,f77,pascal --prefix=/usr --libexecdir=/usr/lib --with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --program-suffix=-3.4 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --with-tune=i686 i486-linux-gnu Thread model: posix gcc version 3.4.6 (Debian 3.4.6-5) $ gcc-3.4 -S -O2 -Wall -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 particle1.c particle1.c:10763: warning: 'f95' defined but not used particle1.c:10775: warning: 'f110' defined but not used particle1.c:10788: warning: 'f126' defined but not used particle1.c:10887: warning: 'f273' defined but not used particle1.c:10888: warning: 'f274' defined but not used particle1.c:10889: warning: 'f275' defined but not used particle1.c:10890: warning: 'f277' defined but not used particle1.c:12456: warning: 'f2456' defined but not used particle1.c:12478: warning: 'f2482' defined but not used particle1.c:12583: warning: 'f2623' defined but not used particle1.c:12631: warning: 'f2690' defined but not used particle1.c:12678: warning: 'f2752' defined but not used particle1.c:12720: warning: 'f2828' defined but not used $ mv particle1.s particle1-gcc34.s $ gcc -o particle1 particle1.c -lm $ ./particle1 0.01999188620615792 $ ls -l -rw-rw-r-- 1 barak barak 6764 2007-05-27 14:38 NOTES -rwxrwxr-x 1 barak barak 736714 2007-05-27 13:08 particle1 -rw-r--r-- 1 barak barak 901853 2007-05-27 12:14 particle1.c -rw-r--r-- 1 barak barak 2383226 2007-05-27 12:41 particle1-gcc295.s -rw-r--r-- 1 barak barak 7291988 2007-05-27 12:46 particle1-gcc33.s -rw-r--r-- 1 barak barak 8005026 2007-05-27 12:55 particle1-gcc34.s -rw-rw-r-- 1 barak barak 1703481 2007-05-27 12:33 particle1-gcc41.s -rw-r--r-- 1 barak barak 1000722 2007-05-27 12:36 particle1-gcc42.s $ wc --lines particle1.c 12825 particle1.c $ wc --lines *.s 163922 particle1-gcc295.s 343012 particle1-gcc33.s 353057 particle1-gcc34.s 100697 particle1-gcc41.s 47030 particle1-gcc42.s