On Wed, Jul 14, 2021 at 1:15 PM Hongtao Liu <crazylht@xxxxxxxxx> wrote: > > Hi: > The original problem was that some users wanted the cmdline option > -ffast-math not to act on intrinsic production code. .i.e for codes > like > > #include<immintrin.h> > __m256d > foo2 (__m256d a, __m256d b, __m256d c, __m256d d) > { > __m256d tmp = _mm256_add_pd (a, b); > tmp = _mm256_sub_pd (tmp, c); > tmp = _mm256_sub_pd (tmp, d); > return tmp; > } > > compiled with -O2 -mavx2 -ffast-math, users expected codes generated like > > vaddpd ymm0, ymm0, ymm1 > vsubpd ymm0, ymm0, ymm2 > vsubpd ymm0, ymm0, ymm3 > > but not > > vsubpd ymm1, ymm1, ymm2 > vsubpd ymm0, ymm0, ymm3 > vaddpd ymm0, ymm1, ymm0 > > > For the LLVM side, there're mechanisms like > #pragma float_control( precise, on, push) > ...(intrinsics definition).. > #pragma float_control(pop) > > When intrinsics are inlined, their IRs will be marked with > "no-fast-math", and even if the caller is compiled with -ffast-math, > reassociation only happens to those IRs which are not marked with > "no-fast-math". It seems to be more flexible to support fast math > control of a region(inside a function). Testcase https://godbolt.org/z/9cYMGGWPG > > Does GCC have a similar mechanism? > > > -- > BR, > Hongtao -- BR, Hongtao