On Sun, May 27, 2007 at 10:28:09PM +0100, Barak A. Pearlmutter wrote: > Hope you don't mind if I ask some follow-up questions. Not at all. > Yup: we had also noticed the zillions of calls to memcpy with static > arguments. This is part of what I meant by "unnecessary data > shuffling". Is there some way to tell GCC that it isn't worth calling > memcpy to copy such short structures? GCC optimizes memcpy according to the size of the memory block and the CPU it is optimizing for. I'm not sure the most recent work on optimizing memcpy() for x86 processors went into GCC 4.2, though. > We could re-jigger our back end to generate FORTRAN instead of C and > use GCC's FORTRAN stuff, maybe that would help? I don't know FORTRAN. I have no idea. > > You will definitely want a lot of inlining for this sort of code, so > > at least use -O3, but perhaps play with the inlining parameters too. > > Right; -O3 didn't make any qualitative difference. (I certainly tried > that before posting.) I do see a whole bunch of inline-related > parameters in the GCC documentation, but it is not clear which I > should tweaked. I tried -O3 -flinline-limit=60000 (default 600) but > even that doesn't make any qualitative difference. You *really* need to crank up those limits. I don't have GCC 4.2, but I tried GCC 4.3 --param inline-call-cost=10000 --param max-inline-insns-auto=20000 --param large-function-growth=1000 --param inline-unit-growth=1000 which wasn't enough. I ran out of memory (256 MB RAM + 757 MB swap) with -finline-limit=60000 --param inline-call-cost=10000 --param max-inline-insns-auto=200000 --param large-function-growth=10000 --param inline-unit-growth=10000. Some versions of GCC need much more memory than others. YMMV. I only noticed right now that you have many functions marked inline. Then you also want to increase the parameter max-inline-insns-single. -- Rask Ingemann Lambertsen