Hello,
I have an "Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz" (/proc/cpuinfo),
which is haswell architecture and includes all of the following flags:
sse sse2 ssse3 sse4_1 sse4_2 avx avx2
I'm usually compiling with these flags for optimization:
CFLAGS="-march=native -mfpmath=sse -O2"
CXXFLAGS="-march=native -mfpmath=sse -O2"
Now to my questions:
1. Since I have avx and avx2, is it viable to add -mavx / -mavx2 to the
above-mentioned flags to improve overall optimization? Does
-march=native enable these automatically?
2. Would adding these flags improve performance in all / most cases?
Are there cases where adding -mavx / -mavx2 backfires in terms of e.g.
performance?
3. Are these flags worth something, even if the code does not
explicitly use any avx/2 instructions?
4. As of the 4.9.2 man page I quote
> GCC depresses SSEx instructions when -mavx is used. Instead,
> it generates new AVX instructions or AVX equivalence for all
> SSEx instructions when needed.
What happens with -mfpmath=sse when -mavx / -mavx2 is enabled? Or
does this mean only the -msse* flags are depressed?
Since -msse2avx is turned on by -mavx automatically are
-mfpmath=sse instructions also encoded with VEX prefix? Would it not
make sense to add an option to have -mpfmath=avx?
5. If I have both avx and avx2, will -mavx2 switch on -mavx
automatically? (this is not covered by the man page)
To answer some of my questions myself, I ran (see output attached):
gcc -march=native -mfpmath=sse -O2 -Q --help=target -v
The output shows, that -mavx and -mavx2 are turned on by default with
-march=native and my haswell architecture. However, -msse2avx is still
disabled. Shouldn't this be enabled as well since -mavx is enabled? Is
this a bug?
Best Regards
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/gcc-multilib/src/gcc-4.9-20150204/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-cloog-backend=isl --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-multilib --disable-werror --enable-checking=release
Thread model: posix
gcc version 4.9.2 20150204 (prerelease) (GCC)
COLLECT_GCC_OPTIONS='-march=native' '-mfpmath=sse' '-O2' '-Q' '--help=target' '-v'
/usr/lib/gcc/x86_64-unknown-linux-gnu/4.9.2/cc1 -v help-dummy -march=haswell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072 -mtune=haswell -dumpbase help-dummy -mfpmath=sse -auxbase help-dummy -O2 -version --help=target -o /tmp/ccfdt7Ak.s
The following options are target specific:
-m128bit-long-double [disabled]
-m16 [disabled]
-m32 [disabled]
-m3dnow [disabled]
-m3dnowa [disabled]
-m64 [enabled]
-m80387 [enabled]
-m8bit-idiv [disabled]
-m96bit-long-double [enabled]
-mabi= sysv
-mabm [enabled]
-maccumulate-outgoing-args [disabled]
-maddress-mode= short
-madx [disabled]
-maes [enabled]
-malign-double [disabled]
-malign-functions= 0
-malign-jumps= 0
-malign-loops= 0
-malign-stringops [enabled]
-mandroid [disabled]
-march= haswell
-masm= att
-mavx [enabled]
-mavx2 [enabled]
-mavx256-split-unaligned-load [disabled]
-mavx256-split-unaligned-store [disabled]
-mavx512cd [disabled]
-mavx512er [disabled]
-mavx512f [disabled]
-mavx512pf [disabled]
-mbionic [disabled]
-mbmi [enabled]
-mbmi2 [enabled]
-mbranch-cost= 0
-mcld [disabled]
-mcmodel= 32
-mcpu=
-mcrc32 [disabled]
-mcx16 [enabled]
-mdispatch-scheduler [disabled]
-mdump-tune-features [disabled]
-mf16c [enabled]
-mfancy-math-387 [enabled]
-mfentry [enabled]
-mfma [enabled]
-mfma4 [disabled]
-mforce-drap [disabled]
-mfp-ret-in-387 [enabled]
-mfpmath= sse
-mfsgsbase [enabled]
-mfused-madd
-mfxsr [enabled]
-mglibc [enabled]
-mhard-float [enabled]
-mhle [disabled]
-mieee-fp [enabled]
-mincoming-stack-boundary= 0
-minline-all-stringops [disabled]
-minline-stringops-dynamically [disabled]
-mintel-syntax
-mlarge-data-threshold= 0x10000
-mlong-double-128 [disabled]
-mlong-double-64 [disabled]
-mlong-double-80 [enabled]
-mlwp [disabled]
-mlzcnt [enabled]
-mmemcpy-strategy=
-mmemset-strategy=
-mmmx [enabled]
-mmovbe [enabled]
-mms-bitfields [disabled]
-mno-align-stringops [disabled]
-mno-default [disabled]
-mno-fancy-math-387 [disabled]
-mno-push-args [disabled]
-mno-red-zone [disabled]
-mno-sse4 [disabled]
-momit-leaf-frame-pointer [disabled]
-mpc32 [disabled]
-mpc64 [disabled]
-mpc80 [disabled]
-mpclmul [enabled]
-mpopcnt [enabled]
-mprefer-avx128 [disabled]
-mpreferred-stack-boundary= 0
-mprefetchwt1 [disabled]
-mprfchw [disabled]
-mpush-args [enabled]
-mrdrnd [enabled]
-mrdseed [disabled]
-mrecip [disabled]
-mrecip=
-mred-zone [enabled]
-mregparm= 0
-mrtd [disabled]
-mrtm [disabled]
-msahf [enabled]
-msha GNU C (GCC) version 4.9.2 20150204 (prerelease) (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.9.2 20150204 (prerelease), GMP version 6.0.0, MPFR version 3.1.2-p11, MPC version 1.0.2
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
[disabled]
-msoft-float [disabled]
-msse [enabled]
-msse2 [enabled]
-msse2avx [disabled]
-msse3 [enabled]
-msse4 [enabled]
-msse4.1 [enabled]
-msse4.2 [enabled]
-msse4a [disabled]
-msse5
-msseregparm [disabled]
-mssse3 [enabled]
-mstack-arg-probe [disabled]
-mstack-protector-guard= tls
-mstackrealign [enabled]
-mstringop-strategy= [default]
-mtbm [disabled]
-mtls-dialect= gnu
-mtls-direct-seg-refs [enabled]
-mtune-ctrl=
-mtune= haswell
-muclibc [disabled]
-mveclibabi= [default]
-mvect8-ret-in-mem [disabled]
-mvzeroupper [disabled]
-mx32 [disabled]
-mxop [disabled]
-mxsave [enabled]
-mxsaveopt [enabled]
Known assembler dialects (for use with the -masm-dialect= option):
att intel
Known ABIs (for use with the -mabi= option):
ms sysv
Known code models (for use with the -mcmodel= option):
32 kernel large medium small
Valid arguments to -mfpmath=:
387 387+sse 387,sse both sse sse+387 sse,387
Known vectorization library ABIs (for use with the -mveclibabi= option):
acml svml
Known address mode (for use with the -maddress-mode= option):
long short
Known stack protector guard (for use with the -mstack-protector-guard= option):
global tls
Valid arguments to -mstringop-strategy=:
byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop
vector_loop
Known TLS dialects (for use with the -mtls-dialect= option):
gnu gnu2
COLLECT_GCC_OPTIONS='-march=native' '-mfpmath=sse' '-O2' '-Q' '--help=target' '-v'
as -v --64 -o /tmp/cc5v6wIa.o /tmp/ccfdt7Ak.s
GNU assembler version 2.25.0 (x86_64-unknown-linux-gnu) using BFD version (GNU Binutils) 2.25.0