On 7/13/20 2:45 PM, Pavel Begunkov wrote: > On 13/07/2020 17:12, Jens Axboe wrote: >> On 7/13/20 2:17 AM, Pavel Begunkov wrote: >>> On 12/07/2020 23:32, Jens Axboe wrote: >>>> On 7/12/20 11:34 AM, Pavel Begunkov wrote: >>>>> On 12/07/2020 18:59, Jens Axboe wrote: >>>>>> On 7/12/20 3:41 AM, Pavel Begunkov wrote: >>>>>>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The >>>>>>> drawback is adding extra kmalloc in draining path, but that's a slow >>>>>>> path, so meh. It also frees some space for the deferred completion path >>>>>>> if would be needed in the future, but the main idea here is to shrink it >>>>>>> to 3 cachelines in the end. >>>>>>> >>>>>>> I'm not happy yet with a few details, so that's not final, but it would >>>>>>> be lovely to hear some feedback. >>>>>> >>>>>> I think it looks pretty good, most of the changes are straight forward. >>>>>> Adding a completion entry that shares the submit space is a good idea, >>>>>> and really helps bring it together. >>>>>> >>>>>> From a quick look, the only part I'm not super crazy about is patch #3. >>>>> >>>>> Thanks! >>>>> >>>>>> I'd probably rather use a generic list name and not unionize the tw >>>>>> lists. >>>>> >>>>> I don't care much, but without compiler's help always have troubles >>>>> finding and distinguishing something as generic as "list". >>>> >>>> To me, it's easier to verify that we're doing the right thing when they >>>> use the same list member. Otherwise you have to cross reference two >>>> different names, easier to shoot yourself in the foot that way. So I'd >>>> prefer just retaining it as 'list' or something generic. >>> >>> If you don't have objections, I'll just leave it "inflight_entry". This >>> one is easy to grep. >> >> Sure, don't have strong feelings on the actual name. >> >>>>> BTW, I thought out how to bring it down to 3 cache lines, but that would >>>>> require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand. >>>>> And there should also be a bunch of nice side effects like improving apoll. >>>> >>>> How would this work with the current use of io_wq_work as storage for >>>> whatever bits we're hanging on to? I guess it could work with a prep >>>> series first more cleanly separating it, though I do feel like we've >>>> been getting closer to that already. >>> >>> It's definitely not a single patch. I'm going to prepare a series for >>> discussion later, and then we'll see whether it worth it. >> >> Definitely not. Let's flesh this one out first, then we can move on. > > But not a lot of work either. Great > I've got a bit lost, do you mean to flesh out the idea or this > "loose 24 bytes" series? The latter, but I'm already looking over your v2, so I guess that's taken care of. -- Jens Axboe