Hi, I'm coming back to this after some experiments. If one compiles the attached example with gcc -c archtest1.c one gets the output archtest1.c:4:2: warning: #warning outer file: AVX512F not defined [-Wcpp] #warning outer file: AVX512F not defined ^~~~~~~ In file included from archtest1.c:8: archtest2.c:2:2: warning: #warning inner file: AVX512F defined [-Wcpp] #warning inner file: AVX512F defined which seems to contradict what Jonathan said about macros not being influenced by the #pragmas. However, if I compile the same code with clang, I get martin@debian:~/tmp$ clang-7 -c archtest1.c archtest1.c:4:2: warning: outer file: AVX512F not defined [-W#warnings] #warning outer file: AVX512F not defined ^ In file included from archtest1.c:8: ./archtest2.c:4:2: warning: inner file: AVX512F not defined [-W#warnings] #warning inner file: AVX512F not defined ^ 2 warnings generated. So the compilers behave differently, even though clang tries to emulate the GCC pragma. My question is now: is the fact that gcc defines the __AVX512F__ macro in the included file a bug, or is this working as intended? Thanks, Martin On 10/25/18 2:50 PM, Marc Glisse wrote: > On Thu, 25 Oct 2018, Martin Reinecke wrote: > >> Hi Jonathan, >> >> thanks for the quick reply! >> >>> Macros are defined during preprocessing, and the preprocessor doesn't >>> know anything about the target_clones attribute. When the compiler >>> sees the attribute it can't go back in time and alter the result of >>> earlier preprocessing. >> >> I feared as much. >> This creates a nasty asymmetry in the sense that gcc's own optimizations >> will be able to use all target features (because the compiler knows that >> it is OK to use specific features like AVX instructions) whereas the >> user has no way to hand-optimize where this becomes necessary. At least >> not using this nice mechanism. >> >>>> Is there a way to achieve what I have in mind? >>> >>> If you want three different implementations of the function I think >>> you need three different clones. Or do runtime checks for the CPU >>> features inside the function, but that seems suboptimal. >> >> I guess I'll just put all functions in question in a separate file and >> compile this with different flags and name prefixes. > > target_clones does nothing magic, you can also look at target and ifunc. > https://gcc.gnu.org/wiki/FunctionMultiVersioning >
#ifdef __AVX512F__ #warning outer file: AVX512F defined #else #warning outer file: AVX512F not defined #endif #pragma GCC target("avx512f") #include "archtest2.c"
#ifdef __AVX512F__ #warning inner file: AVX512F defined #else #warning inner file: AVX512F not defined #endif