On Thu, 25 Oct 2018 at 12:46, Martin Reinecke <martin@xxxxxxxxxxxxxxxxxxx> wrote: > > Hi, > > I'm trying to use gcc's "target_clones" attribute for some functions in > a performance critical library. These functions use gcc builtins and > choose between different sets (standard code, SSE2, AVX) depending on > the predefined macros __SSE2__ and __AVX__. > Unfortunately these macros apparently are not set by the compiler when > it compiles for the individual targets. > > Consider the code below: > > #include <stdio.h> > > __attribute__((target_clones("avx","sse2","default"))) > void foo(void) > { > #if defined(__AVX__) > printf("AVX\n"); > #elif defined(__SSE2__) > printf("SSE2\n"); > #else > printf("nothing special\n"); > #endif > } > > int main(void) > { > foo(); > return 0; > } > > Compiling and running this in an AVX-capable CPU prints "SSE2", where I > would have hoped to see "AVX". Macros are defined during preprocessing, and the preprocessor doesn't know anything about the target_clones attribute. When the compiler sees the attribute it can't go back in time and alter the result of earlier preprocessing. > Is there a way to achieve what I have in mind? If you want three different implementations of the function I think you need three different clones. Or do runtime checks for the CPU features inside the function, but that seems suboptimal.