Hi, all!
I compile the STREAM benchmark using gcc 8.1:
gcc -march=skylake-avx512 -O3 -fopenmp stream.c -o stream
The generated binary doesn't use ZMM registers (AVX512), only YMM
registers (AVX2). The same code compiled with GCC 7.2 looks as expected,
i.e., 8-wide vector instructions are used.
Apparently Intel compiler doesn't generate ZMM instructions unless told
so (-qopt-zmm-usage=high). John McCalpin says the reason is the extra
heating (and the need to step down the frequency), e.g.
Does anyone know, if the same limitation is implemented in GCC 8? If so,
is there a flag to force GCC to generate AVX512 instructions?