Re: performance question with std::complex<float> in new g++ versions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 16 Jan 2010, Thomas Witzel wrote:

I'm stuck on some silly issue and I'm hoping there is a simple
solution to it. I have a piece of code that does nothing but
performing a very large number of products between std::complex<float>
values and some float values in a loop.
Using gcc-4.1.2 and gcc-4.2.4 my standard test case runs for about
7:25 minutes and 6:50 minutes on 3.0Ghz Penryn CPUs (single-threaded),
however when using gcc-4.3.4 or gcc-4.4.2 or even the svn version, my
run-time is > 40 minutes,
which is a serious drop in performance. For this test I reduced all
compiler options down to -O3 only. Now, I looked a bit at the assembly
code produced, and there is two things that are apparent, the gcc-4.3
and newer versions produce
assembly code about twice as long as the older gcc versions. Also,
gcc-4.1 and 4.2 write out all the multiplications in sse code, while
the 4.3 and newer call a routine named __mulsc3.
Has anybody ever encountered such a performance drop and knows whether
there is a compiler flag or something to get my performance back ?

Try -ffast-math (there may be less aggressive flags but that's the direction to look into). To perfectly respect the standard definition of complex multiplication, one has to jump through hoops...

Now even with -ffast-math, I am surprised to see that float*complex generates 4 multiplications, you could look trough bugzilla to see if there is anything about what looks like a missed optimization.

--
Marc Glisse

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux