Re: [RFC 0/9] scrap 24 bytes from io_kiocb

Jens Axboe <axboe@xxxxxxxxx> · Mon, 13 Jul 2020 15:00:34 -0600

On 7/13/20 2:45 PM, Pavel Begunkov wrote:
> On 13/07/2020 17:12, Jens Axboe wrote:
>> On 7/13/20 2:17 AM, Pavel Begunkov wrote:
>>> On 12/07/2020 23:32, Jens Axboe wrote:
>>>> On 7/12/20 11:34 AM, Pavel Begunkov wrote:
>>>>> On 12/07/2020 18:59, Jens Axboe wrote:
>>>>>> On 7/12/20 3:41 AM, Pavel Begunkov wrote:
>>>>>>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The
>>>>>>> drawback is adding extra kmalloc in draining path, but that's a slow
>>>>>>> path, so meh. It also frees some space for the deferred completion path
>>>>>>> if would be needed in the future, but the main idea here is to shrink it
>>>>>>> to 3 cachelines in the end.
>>>>>>>
>>>>>>> I'm not happy yet with a few details, so that's not final, but it would
>>>>>>> be lovely to hear some feedback.
>>>>>>
>>>>>> I think it looks pretty good, most of the changes are straight forward.
>>>>>> Adding a completion entry that shares the submit space is a good idea,
>>>>>> and really helps bring it together.
>>>>>>
>>>>>> From a quick look, the only part I'm not super crazy about is patch #3.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>> I'd probably rather use a generic list name and not unionize the tw
>>>>>> lists.
>>>>>
>>>>> I don't care much, but without compiler's help always have troubles
>>>>> finding and distinguishing something as generic as "list".
>>>>
>>>> To me, it's easier to verify that we're doing the right thing when they
>>>> use the same list member. Otherwise you have to cross reference two
>>>> different names, easier to shoot yourself in the foot that way. So I'd
>>>> prefer just retaining it as 'list' or something generic.
>>>
>>> If you don't have objections, I'll just leave it "inflight_entry". This
>>> one is easy to grep.
>>
>> Sure, don't have strong feelings on the actual name.
>>
>>>>> BTW, I thought out how to bring it down to 3 cache lines, but that would
>>>>> require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand.
>>>>> And there should also be a bunch of nice side effects like improving apoll.
>>>>
>>>> How would this work with the current use of io_wq_work as storage for
>>>> whatever bits we're hanging on to? I guess it could work with a prep
>>>> series first more cleanly separating it, though I do feel like we've
>>>> been getting closer to that already.
>>>
>>> It's definitely not a single patch. I'm going to prepare a series for
>>> discussion later, and then we'll see whether it worth it.
>>
>> Definitely not. Let's flesh this one out first, then we can move on.
> 
> But not a lot of work either.

Great

> I've got a bit lost, do you mean to flesh out the idea or this
> "loose 24 bytes" series?

The latter, but I'm already looking over your v2, so I guess that's
taken care of.

-- 
Jens Axboe