Re: [PATCH next 0/9] for-next clean ups and micro optimisation

Jens Axboe <axboe@xxxxxxxxx> · Tue, 12 Apr 2022 09:12:16 -0600

On 4/12/22 9:05 AM, Jens Axboe wrote:
> On 4/12/22 8:09 AM, Pavel Begunkov wrote:
>> nops benchmark: 40.3 -> 41.1 MIOPS, or +2%
>>
>> Pavel Begunkov (9):
>>   io_uring: explicitly keep a CQE in io_kiocb
>>   io_uring: memcpy CQE from req
>>   io_uring: shrink final link flush
>>   io_uring: inline io_flush_cached_reqs
>>   io_uring: helper for empty req cache checks
>>   io_uring: add helper to return req to cache list
>>   io_uring: optimise submission loop invariant
>>   io_uring: optimise submission left counting
>>   io_uring: optimise io_get_cqe()
>>
>>  fs/io_uring.c | 288 +++++++++++++++++++++++++++++---------------------
>>  1 file changed, 165 insertions(+), 123 deletions(-)
> 
> Get about ~4% on aarch64. I like both main changes, memcpy of cqe and
> the improvements to io_get_cqe().

Ran the nop tests on the 12900K, and I see about an 8% improvement
there, going from ~88M to 95M. I didn't split and check which part
made the most improvement.

-- 
Jens Axboe