Hello, I have run the benchmarks and here are the results: https://openbenchmarking.org/result/2103142-HA-UARCHLEVE55 <https://openbenchmarking.org/result/2103142-HA-UARCHLEVE55> TL;DR: - there is no or negligible performance benefit of *-march=nehalem*, which corresponds to x86_64-v2, - there is a moderate benefit of *-march=haswell* (x86_64-v3) - of around 10%-20% as compared to baseline for the tests performed Geometric Mean Of All Test Results Result Composite Geometric Mean > Higher Is Better O1_generic ....... 367.99 O3_generic ....... 459.84 O3_march_nehalem . 462.89 O3_march_haswell . 531.99 x86_64-v2: There were only two tests in which march=nehalem was meaningfully faster then march=x86_64 (the baseline architecture). These were "graphicsmagick/Swirl" and "FLAC audio encoding". FLAC results were quite noisy (click the "Result confidence" button above the pie chart to show data) so the benefits may not be statistically significant. Swirl appeared to be only around 4% faster. I was surprised because I thought that the benefits would be somewhere around 5-10%. It looks like GCC's autovectorisation does not make much use from the instructions added in SSE3/SSSE3/SSE4. x86_64-v3: The geometric mean of test results was around 15% higher on march=haswell then on baseline x86_64. Apart from john-the-ripper/md5, the tests were up to 36% faster with median performance increase of around 10%. [1] As described in my previous email, I have excluded tests that use dedicated code paths for processors supporting AVX/AVX2/etc. - I saw little point of benchmarking them. I have also excluded some tests with little difference between the -O1 and -O3 optimization levels as it appears that the compiler has little work to do there. So real-world performance benefits of compiling whole Arch for x86_64-v3 would be probably smaller. I think that many workloads of a "typical user" are I/O bound. The limiting factor is likely to be a HDD/SSD, network throughput / latency or a memory speed. Limitations: - GCC 9.3.0 was used, which is not the most recent compiler available. Further research: - benchmarking web browser performance, as this is what matters most for many users, - comparing battery usage (Phoronix Test Suite has support for this when running benchmarks). I do not think it will be much different to performance data, though, How to reproduce: export CFLAGS="-O1 -mtune=generic -march=x86-64" export CXXFLAGS="-O1 -mtune=generic -march=x86-64" phoronix-test-suite benchmark 2103142-HA-UARCHLEVE55 export CFLAGS="-O3 -mtune=generic -march=x86-64" export CXXFLAGS="-O3 -mtune=generic -march=x86-64" phoronix-test-suite benchmark $name_of_test_identifier_specified_before #etc. Conflict of interest: I am opposed to increasing baseline x86_64 requirements in general-purpose distributions. Greetings, Mateusz [1] Visit https://openbenchmarking.org/result/2103142-HA-UARCHLEVE55&rmm=O1_generic%2CO3_march_nehalem and scroll slightly lower.