On 10/4/21 1:02 PM, Pavel Begunkov wrote: > fio/t/io_uring -s32 -d32 -c32 -N1 > > | baseline | 0-15 | 0-16 | diff > setup 1: | 34 MIOPS | 42 MIOPS | 42.2 MIOPS | 25 % > setup 2: | 31 MIOPS | 31 MIOPS | 32 MIOPS | ~3 $ > > Setup 1 gets 25% performance improvement, which is unexpected and a > share of it should be accounted as compiler/HW magic. Setup 2 is just > 3%, but the catch is that some of the patches _very_ unexpectedly sink > performance, so it's more like 31 MIOPS -> 29 -> 30 -> 29 -> 31 -> 32 > > I'd suggest to leave 16/16 aside, maybe for future consideration and > refinement. The end result is not very clear, I'd expect probably > around 3-5% with a more stable setup for nops32, and a better win > for io_cqring_ev_posted() intensive cases like BPF. Looks and tests good for me. I've skipped 16/16 for now, we can evaluate that one later. -- Jens Axboe