Function Multiversioning (FMV) in mainline

Victor Rodriguez <vm.rod25@xxxxxxxxx> · Wed, 11 Nov 2015 09:25:41 -0600

Hi everyone

I am happy to announce the fully implementation of Function
Multiversioning for C /C++ (FMV) in mainline of GCC  ( Thanks to
Evgeny ) . The distribution where you  can play with it already ( in
the mean time others merge it ) is  Clear Linux* OS for Intel®
Architecture (https://clearlinux.org/features/function-multiversioning-fmv).

But what is FMV?

Imagine that you are developing software that could work in multiple
platforms. At the end of the day, it could be running anywhere, maybe
on a server or a home computer. While Intel architecture provides many
powerful instruction set extensions, it is challenging for developers
to generate code that takes advantage of these capabilities.

Currently we as developers have these choices:

-> Write multiple versions of their code, each targeting different
instruction set extensions, and manually handle runtime dispatching of
these versions
-> Generate multiple versions of their binary, each targeting a
different platform
-> Choose a minimum hardware requirement that will not take advantage
of newer platforms

This seems like a lot of work. Wouldn’t it be better to optimize the
same functions for multiple architectures and execute them when the
binary detects the architecture at runtime? This feature exists and is
known as Function Multiversioning (FMV). FMV is a compiler feature
that is capable of optimizing the same code for multiple
architectures, automatically selecting the correct
architecture-specific version of the code at runtime. The Clear Linux*
Project for Intel® Architecture is currently the only Linux
distribution to support Function Multiversioning in C code, making it
easier to develop applications that take advantage of the enhanced
instructions of the Intel architecture.

For example, consider the AVX2 instruction set extension, introduced
in the 4th Generation Intel® Core™ processor family (formerly known as
Haswell). Normally, telling the compiler to use AVX2 instructions
would limit our binary to Haswell and newer processors. With FMV, the
compiler can generate AVX2-optimized versions of the code and will
automatically, at runtime, ensure that only the appropriate versions
are used. In other words, when the binary is run on Haswell or later
generation CPUs, it will use Haswell-specific optimizations, and when
that same binary is run on a pre-Haswell generation processor, it will
fall back to using the standard instructions supported by the older
processor.

You must be wondering if this is possible, so let’s use a simple array
addition  but with some modifications (FMV) :

#define MAX 1000000
int a[256], b[256], c[256];

__attribute__((target_clones("arch=core-avx2","arch=atom","arch=slm","default")))
void foo(){
     int i,x;
         for (x=0; x<MAX; x++){
             for (i=0; i<256; i++){
                 a[i] = b[i] + c[i];
             }
         }
 }
int main () {
    foo();
    return 0;
}

As you can see in the __attribute__ line  you can specify the
architecture you want your binary to run. Actually this binary has the
same time execution that one build with -mavx2 flag but also could run
in an atom system.

Our focus is on applying this technology on packages where we detect
that AVX2 instructions can give a possible improvement. The
experiments we have done show that some packages are already optimized
internally for multiple instruction sets, so FMV would not be needed
there. Other compiler optimization techniques can take advantage of
the profile data to perform additional optimizations based on how the
code behaves (AutoFDO). We use these optimizations and  FMV to improve
the performance as much as possible. We invite the community to use
this new feature and release the power of your code within multiple
architectures with little effort. Write once and deploy everywhere!

Regards

Victor Rodriguez
Intel Open Source Technology Center