On Sun, May 27, 2007 at 03:05:38PM +0100, Barak A. Pearlmutter wrote: > In particular, it defines gobs of new > structure types and gobs of very very short functions, and there are > no pointers used. It should be possible, using the optimization > techniques already present in GCC, for very tense machine code to be > generated from this admittedly strange FORTRAN-style C source code. > But instead, the assembly code GCC generates is full of unnecessary > data shuffling. The way you are using structures forces GCC to copy data around. Unless you somehow manage to inline the whole program into main(), I don't see how it can be any different. > - Some small change we could make to the generated C sources that > would cause it to be optimized well. (Add some magic __attribute__ > somewhere.) Change the structures into scalar variables for a start. GCC has more freedom to place scalar variables than structures. Also, try to arrange function parameters such that sibling call optimization has a chance of working. BAD: int g (int c, int b, int a) { ... } int f (int a, int b, int c) { return g (c, b, a); } GOOD: int g (int a, int b, int c) { ... } int f (int a, int b, int c) { return g (a, b, c); } > Below are notes that include detailed version information on the > compilers used. In the notes below we used > -O2 -freg-struct-return -fomit-frame-pointer -mfpmath=sse -msse3 > but the results don't seem to improve by changing them. You will definitely want a lot of inlining for this sort of code, so at least use -O3, but perhaps play with the inlining parameters too. On a side note, consider using using -march to tell GCC which model of CPU you intend to run the code on. > $ wc --lines *.s > 163922 particle1-gcc295.s > 343012 particle1-gcc33.s > 353057 particle1-gcc34.s > 100697 particle1-gcc41.s > 47030 particle1-gcc42.s I imagine you'll be enlightened by running $ for i in *.s; do echo -n "${i}: "; grep -F -e memcpy ${i} | wc --lines; done -- Rask Ingemann Lambertsen