2009/8/17 Chris Cannam <cannam@xxxxxxxxxxxxxxxxxxxxx>: > On Mon, Aug 17, 2009 at 5:21 AM, Ken Restivo<ken@xxxxxxxxxxx> wrote: >> I'm trying to squeeze the last little bit of juice out of my EEE. >> >> The CPU I have is this: >> http://restivo.org/projects/eee/cpu.txt >> >> This nifty script at http://www.pixelbeat.org/scripts/gcccpuopt , says I should use "-march=core2 -mtune=pentium -mfpmath=sse" >> >> However, the Gentoo people (who I take to be an -funrollloops authority on performance tuning), say I should "-march=core2 -mtune=generic -fomit-frame-pointer -pipe". >> >> And then there is -march=native which many say is just easier and faster. And others recommend putting "-msse2" and other such things. >> >> What say you-all? > > If you want the fastest possible floating point code, then you > probably want something like: > > -march=core2 -msse -msse2 -mfpmath=sse -ffast-math -fomit-frame-pointer -O3 > > ... but with caveats. > > Discussion: > > Supplying -ffast-math causes the use of non-IEEE-compliant math > functions. Among other things, this screws up any code that > explicitly deals with infinity or NaN values or signed zeroes, and > makes assumptions about properties like associativity for the purposes > of optimisation which may not be true in the floating-point world. In > other words, it can give you the wrong results. In _most_ cases, > audio applications are fine with it, but you need to be aware that it > can be problematic. > > However, -ffast-math in combination with -mfpmath-sse has the very > nice quality that it enables denormal flush to zero throughout, thus > avoiding denormal slowdowns in filters and the like. It's also much > faster for some of the apparently simple operations like floor() that > are surprisingly slow in IEEE compliant mode. > > It might be interesting to know what the authors of the programs > you're trying to optimise thought about the use of -ffast-math... > Perhaps you could compile them both ways and compare the output. On the SuperCollider dev list we're just having a conversation about exactly this. NaNs are used in some cases for signalling, and since compiling with -ffast-math implies -ffinite-math-only, that trashes the NaN signalling. This combination seems OK though: "-ffast-math -fno-finite-math-only". The moral of the story is probably that it depends strongly on the app. Who knows if your chosen softwares make use of NaNs and infinities? Hard to tell. Dan > -fomit-frame-pointer is pretty much guaranteed to make things > marginally faster but harder to debug. It won't break anything and it > won't make any huge improvements. > > -O3 rather than -O2 because it enables -ftree-vectorize, which does > some limited auto-vectorization of loops for things like > floating-point copy into SSE operations. This doesn't always do > anything (depends on the code, obviously) but sometimes it makes a > significant difference, for example it helps when compiling my Rubber > Band library. I've never yet seen any problems with the results, but > of course there's always an increased risk of running into > optimisation bugs the more optimisation you do. You can get > interesting (?) debug output about vectorization successes and > failures (mostly failures) with e.g. -ftree-vectorizer-verbose=2. > > I would be slightly suspicious of anyone who recommends -pipe as an > optimisation -- it makes no difference to the resulting code, it just > makes compiling faster. > > If you're using a 64-bit distro, then you can omit the options with > SSE in them (they're all enabled by default in 64-bit gcc). > > > Chris > _______________________________________________ > Linux-audio-user mailing list > Linux-audio-user@xxxxxxxxxxxxxxxxxxxx > http://lists.linuxaudio.org/mailman/listinfo/linux-audio-user > -- http://www.mcld.co.uk _______________________________________________ Linux-audio-user mailing list Linux-audio-user@xxxxxxxxxxxxxxxxxxxx http://lists.linuxaudio.org/mailman/listinfo/linux-audio-user