Re: [PATCH 5.11 2/2] io_uring: don't take percpu_ref operations for registered files in IOPOLL mode

Jens Axboe <axboe@xxxxxxxxx> · Tue, 17 Nov 2020 09:30:40 -0700

On 11/17/20 3:43 AM, Pavel Begunkov wrote:
> On 17/11/2020 06:17, Xiaoguang Wang wrote:
>> In io_file_get() and io_put_file(), currently we use percpu_ref_get() and
>> percpu_ref_put() for registered files, but it's hard to say they're very
>> light-weight synchronization primitives. In one our x86 machine, I get below
>> perf data(registered files enabled):
>> Samples: 480K of event 'cycles', Event count (approx.): 298552867297
>> Overhead  Comman  Shared Object     Symbol
>>    0.45%  :53243  [kernel.vmlinux]  [k] io_file_get
> 
> Do you have throughput/latency numbers? In my experience for polling for
> such small overheads all CPU cycles you win earlier in the stack will be
> just burned on polling, because it would still wait for the same fixed*
> time for the next response by device. fixed* here means post-factum but
> still mostly independent of how your host machine behaves. 

That's only true if you can max out the device with a single core.
Freeing any cycles directly translate into a performance win otherwise,
if your device isn't the bottleneck. For the high performance testing
I've done, the actual polling isn't the bottleneck, it's the rest of the
stack.

-- 
Jens Axboe