Hi, all!
I compile the STREAM benchmark using gcc 8.1:
gcc -march=skylake-avx512 -O3 -fopenmp stream.c -o stream
The generated binary doesn't use ZMM registers (AVX512), only YMM
registers (AVX2). The same code compiled with GCC 7.2 looks as expected,
i.e., 8-wide vector instructions are used.
Apparently Intel compiler doesn't generate ZMM instructions unless told
so (-qopt-zmm-usage=high). John McCalpin says the reason is the extra
heating (and the need to step down the frequency), e.g.
https://software.intel.com/en-us/forums/intel-isa-extensions/topic/747994
Does anyone know, if the same limitation is implemented in GCC 8? If so,
is there a flag to force GCC to generate AVX512 instructions?
Thanks!