Re: [RFC 0/2] 3 cacheline io_kiocb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/25/20 12:24 PM, Pavel Begunkov wrote:
> On 25/07/2020 18:45, Jens Axboe wrote:
>> On 7/25/20 2:31 AM, Pavel Begunkov wrote:
>>> That's not final for a several reasons, but good enough for discussion.
>>> That brings io_kiocb down to 192B. I didn't try to benchmark it
>>> properly, but quick nop test gave +5% throughput increase.
>>> 7531 vs 7910 KIOPS with fio/t/io_uring
>>>
>>> The whole situation is obviously a bunch of tradeoffs. For instance,
>>> instead of shrinking it, we can inline apoll to speed apoll path.
>>>
>>> [2/2] just for a reference, I'm thinking about other ways to shrink it.
>>> e.g. ->link_list can be a single-linked list with linked tiemouts
>>> storing a back-reference. This can turn out to be better, because
>>> that would move ->fixed_file_refs to the 2nd cacheline, so we won't
>>> ever touch 3rd cacheline in the submission path.
>>> Any other ideas?
>>
>> Nothing noticeable for me, still about the same performance. But
>> generally speaking, I don't necessarily think we need to go all in on
>> making this as tiny as possible. It's much more important to chase the
>> items where we only use 2 cachelines for the hot path, and then we have
>> the extra space in there already for the semi hot paths like poll driven
>> retry. Yes, we're still allocating from a pool that has slightly larger
>> objects, but that doesn't really matter _that_ much. Avoiding an extra
>> kmalloc+kfree for the semi hot paths are a bigger deal than making
>> io_kiocb smaller and smaller.
>>
>> That said, for no-brainer changes, we absolutely should make it smaller.
>> I just don't want to jump through convoluted hoops to get there.
> 
> Agree, but that's not the end goal. The first point is to kill the union,
> but it already has enough space for that.

Right

> The second is to see, whether we can use the space in a better way. From
> the high level perspective ->apoll and ->work are alike and both serve to
> provide asynchronous paths, hence the idea to swap them naturally comes to
> mind.

Totally agree, which is why the union of those kind of makes sense.
We're definitely NOT using them at the same time, but the fact that we
had various mm/creds/whatnot in the work_struct made that a bit iffy.

> TBH, I don't think it'd do much, because init of ->io would probably
> hide any benefit.

There should be no ->io init/alloc for this test case.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux