Success! Some working magic seems to be this: gcc -s -o particle1 \ -O3 \ -march=k8 \ -mfpmath=sse \ -finline-limit=100000 \ --param large-function-insns=1000000 \ --param inline-unit-growth=1000000 \ --param sra-field-structure-ratio=0 \ particle1.c -lm although it looks like -Os gives an additional improvement. This (with GCC 4.1) reduces code volume to about 16k from a previous near 1M, and reduces runtime by a factor of about 2700, as compared to just -O3. Further improvements welcome. I'd also suggest adding a section to the GCC documentation on "how to use GCC as a back-end to another compiler" which gives some typical magic options like the above that would be useful in circumstances like these. -- Barak A. Pearlmutter <barak@xxxxxxxxxx> Hamilton Institute & Dept Comp Sci, NUI Maynooth, Co. Kildare, Ireland http://www.bcl.hamilton.ie/~barak/