On 9/12/21 4:01 PM, Bart Van Assche wrote: > On 9/12/21 06:03, Jens Axboe wrote: >> On 9/11/21 9:19 PM, Bart Van Assche wrote: >>> The performance numbers in the patch description come from a >>> Intel Xeon Gold 6154 CPU. I reran the test today on an old Intel >>> Core i7-4790 CPU and obtained the opposite result: higher IOPS >>> without this patch than with this patch although the assembler >>> code looks to be the same. It seems like how fast "rep stos" >>> runs depends on the CPU type? >> >> It does appear so. Which is a bit frustrating... > > Further measurements have shown that this behavior is specific to > gcc and also that clang always generates faster code for the version > of bio_init() in my patch. I have reported this as a bug to the gcc > project. See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102294. Interesting! Here are some results from my end. First the 3970X again: gcc-11.1 Elapsed time: 0.980807 s Elapsed time: 0.452951 s Elapsed time: 0.949918 s clang-11.0 Elapsed time: 0.284734 s Elapsed time: 0.356595 s Elapsed time: 0.285459 s And my laptop, which is using: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz gcc-11.1 Elapsed time: 0.218427 s Elapsed time: 0.235000 s Elapsed time: 0.214217 s clang-11.0 Elapsed time: 0.217436 s Elapsed time: 0.170959 s Elapsed time: 0.149630 s All compiles done with -O2 -march=native Now I kind of want to compile the kernel with clang and see how that goes... -- Jens Axboe