Re: [RFC 0/9] scrap 24 bytes from io_kiocb

Pavel Begunkov <asml.silence@xxxxxxxxx> · Mon, 13 Jul 2020 23:45:35 +0300

On 13/07/2020 17:12, Jens Axboe wrote:
> On 7/13/20 2:17 AM, Pavel Begunkov wrote:
>> On 12/07/2020 23:32, Jens Axboe wrote:
>>> On 7/12/20 11:34 AM, Pavel Begunkov wrote:
>>>> On 12/07/2020 18:59, Jens Axboe wrote:
>>>>> On 7/12/20 3:41 AM, Pavel Begunkov wrote:
>>>>>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
>>>>>> drawback is adding extra kmalloc in draining path, but that's a slow
>>>>>> path, so meh. It also frees some space for the deferred completion path
>>>>>> if would be needed in the future, but the main idea here is to shrink it
>>>>>> to 3 cachelines in the end.
>>>>>>
>>>>>> I'm not happy yet with a few details, so that's not final, but it would
>>>>>> be lovely to hear some feedback.
>>>>>
>>>>> I think it looks pretty good, most of the changes are straight forward.
>>>>> Adding a completion entry that shares the submit space is a good idea,
>>>>> and really helps bring it together.
>>>>>
>>>>> From a quick look, the only part I'm not super crazy about is patch #3.
>>>>
>>>> Thanks!
>>>>
>>>>> I'd probably rather use a generic list name and not unionize the tw
>>>>> lists.
>>>>
>>>> I don't care much, but without compiler's help always have troubles
>>>> finding and distinguishing something as generic as "list".
>>>
>>> To me, it's easier to verify that we're doing the right thing when they
>>> use the same list member. Otherwise you have to cross reference two
>>> different names, easier to shoot yourself in the foot that way. So I'd
>>> prefer just retaining it as 'list' or something generic.
>>
>> If you don't have objections, I'll just leave it "inflight_entry". This
>> one is easy to grep.
> 
> Sure, don't have strong feelings on the actual name.
> 
>>>> BTW, I thought out how to bring it down to 3 cache lines, but that would
>>>> require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand.
>>>> And there should also be a bunch of nice side effects like improving apoll.
>>>
>>> How would this work with the current use of io_wq_work as storage for
>>> whatever bits we're hanging on to? I guess it could work with a prep
>>> series first more cleanly separating it, though I do feel like we've
>>> been getting closer to that already.
>>
>> It's definitely not a single patch. I'm going to prepare a series for
>> discussion later, and then we'll see whether it worth it.
> 
> Definitely not. Let's flesh this one out first, then we can move on.

But not a lot of work either.
I've got a bit lost, do you mean to flesh out the idea or this
"loose 24 bytes" series?

-- 
Pavel Begunkov