Re: "float complex" arithmetic performance much slower than expected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/6/2013 3:09 PM, Michele Martone wrote:
On 20130306@12:08, Tim Prince wrote:
  . CFLAGS for gcc:
"-O3 -pipe -march=native -mtune=native -mavx -std=c99 -fno-unroll-loops"
  . CFLAGS for icc: "-O3 -xAVX -restrict -unroll=0"
...
Do you find this consistent with your experience in "complex" and gcc,
or it may be the case I am ignoring some basic rule in using gcc ?

In the absence of -fcx-limited-range, gcc may protect divide and
sqrt by using library functions, where icc would simply widen to
double.  You would see any such library function usage if you
profiled by gprof, at least when the library is static linked. Also,
the library functions used by gcc aren't vectorized, while icc would
go further toward promoting vectorization by in-lining code or
calling vector math functions.
In man gcc I see that -fcx-limited-range affects both multiplication and
division, while -fcx-fortran-rules only division.

The functions in my code only contain integer/floating point array accesses
and add / multiply operations.
So only multiplications may be accelerated by this or -ffast-math.

And as you suggest, man icc says -no-complex-limited-range is the
default, so  icc -O3  would need at least -complex-limited-range to be
fairly compared to gcc -ffast-math .

Vectorization reports for both compilers would shed light on this question.
Ok: Could you please suggest the options for getting "enough, but not too
many" report info ?
gcc -ftree-vectorize=1 tells which loops are auto-vectorized. Larger numbers give more details.
Similarly, icc -vec-report1 and larger numbers, or -opt-report.

--
Tim Prince



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux