Hi everyone I am happy to announce the fully implementation of Function Multiversioning for C /C++ (FMV) in mainline of GCC ( Thanks to Evgeny ) . The distribution where you can play with it already ( in the mean time others merge it ) is Clear Linux* OS for Intel® Architecture (https://clearlinux.org/features/function-multiversioning-fmv). But what is FMV? Imagine that you are developing software that could work in multiple platforms. At the end of the day, it could be running anywhere, maybe on a server or a home computer. While Intel architecture provides many powerful instruction set extensions, it is challenging for developers to generate code that takes advantage of these capabilities. Currently we as developers have these choices: -> Write multiple versions of their code, each targeting different instruction set extensions, and manually handle runtime dispatching of these versions -> Generate multiple versions of their binary, each targeting a different platform -> Choose a minimum hardware requirement that will not take advantage of newer platforms This seems like a lot of work. Wouldn’t it be better to optimize the same functions for multiple architectures and execute them when the binary detects the architecture at runtime? This feature exists and is known as Function Multiversioning (FMV). FMV is a compiler feature that is capable of optimizing the same code for multiple architectures, automatically selecting the correct architecture-specific version of the code at runtime. The Clear Linux* Project for Intel® Architecture is currently the only Linux distribution to support Function Multiversioning in C code, making it easier to develop applications that take advantage of the enhanced instructions of the Intel architecture. For example, consider the AVX2 instruction set extension, introduced in the 4th Generation Intel® Core™ processor family (formerly known as Haswell). Normally, telling the compiler to use AVX2 instructions would limit our binary to Haswell and newer processors. With FMV, the compiler can generate AVX2-optimized versions of the code and will automatically, at runtime, ensure that only the appropriate versions are used. In other words, when the binary is run on Haswell or later generation CPUs, it will use Haswell-specific optimizations, and when that same binary is run on a pre-Haswell generation processor, it will fall back to using the standard instructions supported by the older processor. You must be wondering if this is possible, so let’s use a simple array addition but with some modifications (FMV) : #define MAX 1000000 int a[256], b[256], c[256]; __attribute__((target_clones("arch=core-avx2","arch=atom","arch=slm","default"))) void foo(){ int i,x; for (x=0; x<MAX; x++){ for (i=0; i<256; i++){ a[i] = b[i] + c[i]; } } } int main () { foo(); return 0; } As you can see in the __attribute__ line you can specify the architecture you want your binary to run. Actually this binary has the same time execution that one build with -mavx2 flag but also could run in an atom system. Our focus is on applying this technology on packages where we detect that AVX2 instructions can give a possible improvement. The experiments we have done show that some packages are already optimized internally for multiple instruction sets, so FMV would not be needed there. Other compiler optimization techniques can take advantage of the profile data to perform additional optimizations based on how the code behaves (AutoFDO). We use these optimizations and FMV to improve the performance as much as possible. We invite the community to use this new feature and release the power of your code within multiple architectures with little effort. Write once and deploy everywhere! Regards Victor Rodriguez Intel Open Source Technology Center