Re: [PATCH 00/16] squeeze more performance

Jens Axboe <axboe@xxxxxxxxx> · Mon, 4 Oct 2021 14:19:39 -0600

On 10/4/21 1:02 PM, Pavel Begunkov wrote:
> fio/t/io_uring -s32 -d32 -c32 -N1
> 
>           | baseline  | 0-15      | 0-16        | diff
> setup 1:  | 34 MIOPS  | 42 MIOPS  | 42.2  MIOPS | 25 %
> setup 2:  | 31 MIOPS  | 31 MIOPS  | 32    MIOPS | ~3 $
> 
> Setup 1 gets 25% performance improvement, which is unexpected and a
> share of it should be accounted as compiler/HW magic. Setup 2 is just
> 3%, but the catch is that some of the patches _very_ unexpectedly sink
> performance, so it's more like 31 MIOPS -> 29 -> 30 -> 29 -> 31 -> 32
> 
> I'd suggest to leave 16/16 aside, maybe for future consideration and
> refinement. The end result is not very clear, I'd expect probably
> around 3-5% with a more stable setup for nops32, and a better win
> for io_cqring_ev_posted() intensive cases like BPF.

Looks and tests good for me. I've skipped 16/16 for now, we can
evaluate that one later.

-- 
Jens Axboe