Re: [PATCH 0/2] task_put batching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/20/20 10:06 AM, Pavel Begunkov wrote:
> On 20/07/2020 18:49, Jens Axboe wrote:
>> On 7/20/20 9:22 AM, Pavel Begunkov wrote:
>>> On 18/07/2020 17:37, Jens Axboe wrote:
>>>> On 7/18/20 2:32 AM, Pavel Begunkov wrote:
>>>>> For my a bit exaggerated test case perf continues to show high CPU
>>>>> cosumption by io_dismantle(), and so calling it io_iopoll_complete().
>>>>> Even though the patch doesn't yield throughput increase for my setup,
>>>>> probably because the effect is hidden behind polling, but it definitely
>>>>> improves relative percentage. And the difference should only grow with
>>>>> increasing number of CPUs. Another reason to have this is that atomics
>>>>> may affect other parallel tasks (e.g. which doesn't use io_uring)
>>>>>
>>>>> before:
>>>>> io_iopoll_complete: 5.29%
>>>>> io_dismantle_req:   2.16%
>>>>>
>>>>> after:
>>>>> io_iopoll_complete: 3.39%
>>>>> io_dismantle_req:   0.465%
>>>>
>>>> Still not seeing a win here, but it's clean and it _should_ work. For
>>>> some reason I end up getting the offset in task ref put growing the
>>>> fput_many(). Which doesn't (on the surface) make a lot of sense, but
>>>> may just mean that we have some weird side effects.
>>>
>>> It grows because the patch is garbage, the second condition is always false.
>>> See the diff. Could you please drop both patches?
>>
>> Hah, indeed. With this on top, it looks like it should in terms of
>> performance and profiles.
> 
> It just shows, that it doesn't really matters for a single-threaded app,
> as expected. Worth to throw some contention though. I'll think about
> finding some time to get/borrow a multi-threaded one.

But it kind of did here, ended up being mostly a wash in terms of perf
here as my testing reported. With the incremental applied, it's up a bit
over before the task put batching.

>> I can just fold this into the existing one, if you'd like.
> 
> Would be nice. I'm going to double-check the counter and re-measure anyway.
> BTW, how did you find it? A tool or a proc file would be awesome.

For this kind of testing, I just use t/io_uring out of fio. It's probably
the lowest overhead kind of tool:

# sudo taskset -c 0 t/io_uring -b512 -p1 /dev/nvme2n1

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux