On 7/12/20 11:34 AM, Pavel Begunkov wrote: > On 12/07/2020 18:59, Jens Axboe wrote: >> On 7/12/20 3:41 AM, Pavel Begunkov wrote: >>> Make io_kiocb slimmer by 24 bytes mainly by revising lists usage. The >>> drawback is adding extra kmalloc in draining path, but that's a slow >>> path, so meh. It also frees some space for the deferred completion path >>> if would be needed in the future, but the main idea here is to shrink it >>> to 3 cachelines in the end. >>> >>> I'm not happy yet with a few details, so that's not final, but it would >>> be lovely to hear some feedback. >> >> I think it looks pretty good, most of the changes are straight forward. >> Adding a completion entry that shares the submit space is a good idea, >> and really helps bring it together. >> >> From a quick look, the only part I'm not super crazy about is patch #3. > > Thanks! > >> I'd probably rather use a generic list name and not unionize the tw >> lists. > > I don't care much, but without compiler's help always have troubles > finding and distinguishing something as generic as "list". To me, it's easier to verify that we're doing the right thing when they use the same list member. Otherwise you have to cross reference two different names, easier to shoot yourself in the foot that way. So I'd prefer just retaining it as 'list' or something generic. > BTW, I thought out how to bring it down to 3 cache lines, but that would > require taking io_wq_work out of io_kiocb and kmalloc'ing it on demand. > And there should also be a bunch of nice side effects like improving apoll. How would this work with the current use of io_wq_work as storage for whatever bits we're hanging on to? I guess it could work with a prep series first more cleanly separating it, though I do feel like we've been getting closer to that already. Definitely always interested in shrinking io_kiocb, just need to keep an eye out for the fast(er) paths not needing two allocations (and frees) for a single request. -- Jens Axboe