On 7/25/20 12:24 PM, Pavel Begunkov wrote: > On 25/07/2020 18:45, Jens Axboe wrote: >> On 7/25/20 2:31 AM, Pavel Begunkov wrote: >>> That's not final for a several reasons, but good enough for discussion. >>> That brings io_kiocb down to 192B. I didn't try to benchmark it >>> properly, but quick nop test gave +5% throughput increase. >>> 7531 vs 7910 KIOPS with fio/t/io_uring >>> >>> The whole situation is obviously a bunch of tradeoffs. For instance, >>> instead of shrinking it, we can inline apoll to speed apoll path. >>> >>> [2/2] just for a reference, I'm thinking about other ways to shrink it. >>> e.g. ->link_list can be a single-linked list with linked tiemouts >>> storing a back-reference. This can turn out to be better, because >>> that would move ->fixed_file_refs to the 2nd cacheline, so we won't >>> ever touch 3rd cacheline in the submission path. >>> Any other ideas? >> >> Nothing noticeable for me, still about the same performance. But >> generally speaking, I don't necessarily think we need to go all in on >> making this as tiny as possible. It's much more important to chase the >> items where we only use 2 cachelines for the hot path, and then we have >> the extra space in there already for the semi hot paths like poll driven >> retry. Yes, we're still allocating from a pool that has slightly larger >> objects, but that doesn't really matter _that_ much. Avoiding an extra >> kmalloc+kfree for the semi hot paths are a bigger deal than making >> io_kiocb smaller and smaller. >> >> That said, for no-brainer changes, we absolutely should make it smaller. >> I just don't want to jump through convoluted hoops to get there. > > Agree, but that's not the end goal. The first point is to kill the union, > but it already has enough space for that. Right > The second is to see, whether we can use the space in a better way. From > the high level perspective ->apoll and ->work are alike and both serve to > provide asynchronous paths, hence the idea to swap them naturally comes to > mind. Totally agree, which is why the union of those kind of makes sense. We're definitely NOT using them at the same time, but the fact that we had various mm/creds/whatnot in the work_struct made that a bit iffy. > TBH, I don't think it'd do much, because init of ->io would probably > hide any benefit. There should be no ->io init/alloc for this test case. -- Jens Axboe