Re: [RFC 0/9] scrap 24 bytes from io_kiocb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/12/20 11:34 AM, Pavel Begunkov wrote:
> On 12/07/2020 18:59, Jens Axboe wrote:
>> On 7/12/20 3:41 AM, Pavel Begunkov wrote:
>>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
>>> drawback is adding extra kmalloc in draining path, but that's a slow
>>> path, so meh. It also frees some space for the deferred completion path
>>> if would be needed in the future, but the main idea here is to shrink it
>>> to 3 cachelines in the end.
>>>
>>> I'm not happy yet with a few details, so that's not final, but it would
>>> be lovely to hear some feedback.
>>
>> I think it looks pretty good, most of the changes are straight forward.
>> Adding a completion entry that shares the submit space is a good idea,
>> and really helps bring it together.
>>
>> From a quick look, the only part I'm not super crazy about is patch #3.
> 
> Thanks!
> 
>> I'd probably rather use a generic list name and not unionize the tw
>> lists.
> 
> I don't care much, but without compiler's help always have troubles
> finding and distinguishing something as generic as "list".

To me, it's easier to verify that we're doing the right thing when they
use the same list member. Otherwise you have to cross reference two
different names, easier to shoot yourself in the foot that way. So I'd
prefer just retaining it as 'list' or something generic.

> BTW, I thought out how to bring it down to 3 cache lines, but that would
> require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand.
> And there should also be a bunch of nice side effects like improving apoll.

How would this work with the current use of io_wq_work as storage for
whatever bits we're hanging on to? I guess it could work with a prep
series first more cleanly separating it, though I do feel like we've
been getting closer to that already.

Definitely always interested in shrinking io_kiocb, just need to keep
an eye out for the fast(er) paths not needing two allocations (and
frees) for a single request.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux