On 8 May 2018 at 03:17, Feng Longda wrote: > This is just for broadwell, but if the processor is skylake or some > newer type, it can't full use of the processor new microcode. Have you actually profiled and measured to see if using the new instructions makes any difference for your code? (For example, by compiling with -march=skylake-avx512 and running on the machine with that processor). If you haven't measured it, then apparently performance is not so important for you that you actually bother to measure. So you might as well just use -march=broadlake because that's the simplest solution. If you have measured it, and the performance is not significantly better, then you might as well just use -march=broadlake because that's the simplest solution. If you have measured it, and the performance is better, then it might be worth using different instruction sets for different machines. > Does we have better solution for this kind of use case. Did you not read the other reply from Xi Ruoyao, which gave you a solution? There are several approaches you could take. You could compile the code separately for each family of machine. Or you could compare the performance of each instruction set on each machine and see if you can reduce the number of separate binaries you need, e.g. maybe binaries compiled with -march=broadwell can be used on the broadwell and haswell machines, and binaries compiled with -march=skylake can be used on skylake and skylake-avx512 machines, so you only need two sets of binaries. A variation on this approach is to only compiling some object files with different options, and compile the rest with the same options, and link different sets of objects into different binaries for different machines. Or if you've profiled the code and determined that there are specific hotspots in the code that can benefit from different instruction sets then you can use __attribute__((target("xxx"))) to compile different functions with different options, and then call those different functions according to which machine the code is running on (e.g. by testing the processor flags at run-time). Or you can automate that by using __attribute__((target_clones("broadwell,haswell,skylake,skylake-avx512"))) on the important functions, so that GCC will automatically create multiple copies of the function compiled differently, and arrange for the relevant function to be called automatically according to the hardware the program runs on. But if you haven't profiled and determined which functions can benefit from using different instruction sets then you would be wasting your time and should just use the same binaries everywhere.