Re: "float complex" arithmetic performance much slower than expected

Tim Prince <n8tm@xxxxxxx> · Wed, 06 Mar 2013 19:25:10 -0500

On 3/6/2013 3:09 PM, Michele Martone wrote:
On 20130306@12:08, Tim Prince wrote:
  . CFLAGS for gcc:
"-O3 -pipe -march=native -mtune=native -mavx -std=c99 -fno-unroll-loops"
  . CFLAGS for icc: "-O3 -xAVX -restrict -unroll=0"
...
Do you find this consistent with your experience in "complex" and gcc,
or it may be the case I am ignoring some basic rule in using gcc ?

In the absence of -fcx-limited-range, gcc may protect divide and
sqrt by using library functions, where icc would simply widen to
double.  You would see any such library function usage if you
profiled by gprof, at least when the library is static linked. Also,
the library functions used by gcc aren't vectorized, while icc would
go further toward promoting vectorization by in-lining code or
calling vector math functions.
In man gcc I see that -fcx-limited-range affects both multiplication and
division, while -fcx-fortran-rules only division.

The functions in my code only contain integer/floating point array accesses
and add / multiply operations.
So only multiplications may be accelerated by this or -ffast-math.

And as you suggest, man icc says -no-complex-limited-range is the
default, so  icc -O3  would need at least -complex-limited-range to be
fairly compared to gcc -ffast-math .

Vectorization reports for both compilers would shed light on this question.
Ok: Could you please suggest the options for getting "enough, but not too
many" report info ?
gcc -ftree-vectorize=1 tells which loops are auto-vectorized. Larger 
numbers give more details.
Similarly, icc -vec-report1 and larger numbers, or -opt-report.

--
Tim Prince