On 10/4/21 2:19 PM, Jens Axboe wrote: > On 10/4/21 1:02 PM, Pavel Begunkov wrote: >> fio/t/io_uring -s32 -d32 -c32 -N1 >> >> | baseline | 0-15 | 0-16 | diff >> setup 1: | 34 MIOPS | 42 MIOPS | 42.2 MIOPS | 25 % >> setup 2: | 31 MIOPS | 31 MIOPS | 32 MIOPS | ~3 $ >> >> Setup 1 gets 25% performance improvement, which is unexpected and a >> share of it should be accounted as compiler/HW magic. Setup 2 is just >> 3%, but the catch is that some of the patches _very_ unexpectedly sink >> performance, so it's more like 31 MIOPS -> 29 -> 30 -> 29 -> 31 -> 32 >> >> I'd suggest to leave 16/16 aside, maybe for future consideration and >> refinement. The end result is not very clear, I'd expect probably >> around 3-5% with a more stable setup for nops32, and a better win >> for io_cqring_ev_posted() intensive cases like BPF. > > Looks and tests good for me. I've skipped 16/16 for now, we can > evaluate that one later. For reference, running this on just the faster box: Setup/Test | Peak-1-thread Peak-2-threads NOPS Diff ------------------------------------------------------------------ Setup 2 pre | 5.07M 5.74M 71.1M Setup 2 post | 5.23M 5.84M 73.9M which is a pretty substantial win. -- Jens Axboe