Re: How to define multiple processor families?

Jonathan Wakely <jwakely.gcc@xxxxxxxxx> · Tue, 8 May 2018 11:31:14 +0100

On 8 May 2018 at 03:17, Feng Longda wrote:
> This is just for broadwell, but if the processor is skylake or some
> newer type, it can't full use of the processor new microcode.

Have you actually profiled and measured to see if using the new
instructions makes any difference for your code? (For example, by
compiling with -march=skylake-avx512 and running on the machine with
that processor).

If you haven't measured it, then apparently performance is not so
important for you that you actually bother to measure. So you might as
well just use -march=broadlake because that's the simplest solution.

If you have measured it, and the performance is not significantly
better, then you might as well just use -march=broadlake because
that's the simplest solution.

If you have measured it, and the performance is better, then it might
be worth using different instruction sets for different machines.

> Does we have better solution for this kind of use case.

Did you not read the other reply from Xi Ruoyao, which gave you a solution?

There are several approaches you could take. You could compile the
code separately for each family of machine. Or you could compare the
performance of each instruction set on each machine and see if you can
reduce the number of separate binaries you need, e.g. maybe binaries
compiled with -march=broadwell can be used on the broadwell and
haswell machines, and binaries compiled with -march=skylake can be
used on skylake and skylake-avx512 machines, so you only need two sets
of binaries. A variation on this approach is to only compiling some
object files with different options, and compile the rest with the
same options, and link different sets of objects into different
binaries for different machines.

Or if you've profiled the code and determined that there are specific
hotspots in the code that can benefit from different instruction sets
then you can use __attribute__((target("xxx"))) to compile different
functions with different options, and then call those different
functions according to which machine the code is running on (e.g. by
testing the processor flags at run-time). Or you can automate that by
using __attribute__((target_clones("broadwell,haswell,skylake,skylake-avx512")))
on the important functions, so that GCC will automatically create
multiple copies of the function compiled differently, and arrange for
the relevant function to be called automatically according to the
hardware the program runs on.

But if you haven't profiled and determined which functions can benefit
from using different instruction sets then you would be wasting your
time and should just use the same binaries everywhere.