On Wed, Dec 4, 2024, at 16:36, Tor Vic wrote: > On 12/4/24 11:30, Arnd Bergmann wrote: > Similar but not identical changes have been proposed in the past several > times like e.g. in 1, 2 and likely even more often. > > Your solution seems to be much cleaner, I like it. Thanks. It looks like the other two did not actually address the bug I'm fixing in my version. > That said, on my Skylake platform, there is no difference between > -march=x86-64 and -march=x86-64-v3 in terms of kernel binary size or > performance. > I think Boris also said that these settings make no real difference on > code generation. As Nathan pointed out, I had a typo in my patch, so the options didn't actually do anything at all. I fixed it now and did a 'defconfig' test build with all three: > Other settings might make a small difference (numbers are from 2023): > -generic: 85.089.784 bytes > -core2: 85.139.932 bytes > -march=skylake: 85.017.808 bytes text data bss dec hex filename 26664466 10806622 1490948 38962036 2528374 obj-x86/vmlinux-v1 26664466 10806622 1490948 38962036 2528374 obj-x86/vmlinux-v2 26662504 10806654 1490948 38960106 2527bea obj-x86/vmlinux-v3 which is a tiny 2KB saved between v2 and v3. I looked at the object code and found that the v3 version takes advantage of the BMI extension, which makes perfect sense. Not sure if it has any real performance benefits. Between v1 and v2, there is a chance to turn things like system_has_cmpxchg128() into a constant on v2 and higher. The v4 version is meaningless in practice since it only adds AVX512 instructions that are only present in very few CPUs and not that useful inside the kernel side from specialized crypto and raid helpers. Arnd