Re: [PATCH 00/16] squeeze more performance

Jens Axboe <axboe@xxxxxxxxx> · Mon, 4 Oct 2021 14:33:10 -0600

On 10/4/21 2:19 PM, Jens Axboe wrote:
> On 10/4/21 1:02 PM, Pavel Begunkov wrote:
>> fio/t/io_uring -s32 -d32 -c32 -N1
>>
>>           | baseline  | 0-15      | 0-16        | diff
>> setup 1:  | 34 MIOPS  | 42 MIOPS  | 42.2  MIOPS | 25 %
>> setup 2:  | 31 MIOPS  | 31 MIOPS  | 32    MIOPS | ~3 $
>>
>> Setup 1 gets 25% performance improvement, which is unexpected and a
>> share of it should be accounted as compiler/HW magic. Setup 2 is just
>> 3%, but the catch is that some of the patches _very_ unexpectedly sink
>> performance, so it's more like 31 MIOPS -> 29 -> 30 -> 29 -> 31 -> 32
>>
>> I'd suggest to leave 16/16 aside, maybe for future consideration and
>> refinement. The end result is not very clear, I'd expect probably
>> around 3-5% with a more stable setup for nops32, and a better win
>> for io_cqring_ev_posted() intensive cases like BPF.
> 
> Looks and tests good for me. I've skipped 16/16 for now, we can
> evaluate that one later.

For reference, running this on just the faster box:

Setup/Test   |  Peak-1-thread   Peak-2-threads   NOPS   Diff
------------------------------------------------------------------
Setup 2 pre  |      5.07M            5.74M       71.1M
Setup 2 post |      5.23M            5.84M       73.9M

which is a pretty substantial win.

-- 
Jens Axboe