On 7/25/20 2:14 PM, Pavel Begunkov wrote: > On 25/07/2020 22:40, Jens Axboe wrote: >> On 7/25/20 12:24 PM, Pavel Begunkov wrote: >>> On 25/07/2020 18:45, Jens Axboe wrote: >>>> On 7/25/20 2:31 AM, Pavel Begunkov wrote: >>>>> That's not final for a several reasons, but good enough for discussion. >>>>> That brings io_kiocb down to 192B. I didn't try to benchmark it >>>>> properly, but quick nop test gave +5% throughput increase. >>>>> 7531 vs 7910 KIOPS with fio/t/io_uring >>>>> >>>>> The whole situation is obviously a bunch of tradeoffs. For instance, >>>>> instead of shrinking it, we can inline apoll to speed apoll path. >>>>> >>>>> [2/2] just for a reference, I'm thinking about other ways to shrink it. >>>>> e.g. ->link_list can be a single-linked list with linked tiemouts >>>>> storing a back-reference. This can turn out to be better, because >>>>> that would move ->fixed_file_refs to the 2nd cacheline, so we won't >>>>> ever touch 3rd cacheline in the submission path. >>>>> Any other ideas? >>>> >>>> Nothing noticeable for me, still about the same performance. But >>>> generally speaking, I don't necessarily think we need to go all in on >>>> making this as tiny as possible. It's much more important to chase the >>>> items where we only use 2 cachelines for the hot path, and then we have >>>> the extra space in there already for the semi hot paths like poll driven >>>> retry. Yes, we're still allocating from a pool that has slightly larger >>>> objects, but that doesn't really matter _that_ much. Avoiding an extra >>>> kmalloc+kfree for the semi hot paths are a bigger deal than making >>>> io_kiocb smaller and smaller. >>>> >>>> That said, for no-brainer changes, we absolutely should make it smaller. >>>> I just don't want to jump through convoluted hoops to get there. >>> >>> Agree, but that's not the end goal. The first point is to kill the union, >>> but it already has enough space for that. >> >> Right >> >>> The second is to see, whether we can use the space in a better way. From >>> the high level perspective ->apoll and ->work are alike and both serve to >>> provide asynchronous paths, hence the idea to swap them naturally comes to >>> mind. >> >> Totally agree, which is why the union of those kind of makes sense. >> We're definitely NOT using them at the same time, but the fact that we >> had various mm/creds/whatnot in the work_struct made that a bit iffy. > > Thinking of it, if combined with work de-init as you proposed before, it's > probably possible to make a layout similar to the one below > > struct io_kiocb { > ... > struct hlist_node hash_node; > struct callback_head task_work; > union { > struct io_wq_work work; > struct async_poll apoll; > }; > }; > > Saves ->apoll kmalloc(), and the actual work de-init would be negligibly > rare. Worth to try And hopefully get rid of the stupid apoll->work and the copy back and forth... But yes, this would be most excellent, and an ideal layout. >>> TBH, I don't think it'd do much, because init of ->io would probably >>> hide any benefit. >> >> There should be no ->io init/alloc for this test case. > > I mean, before getting into io_arm_poll_handler(), it should get -EAGAIN > in io_{read,write}() and initialise ->io in io_setup_async_rw(), at least > for READV, WRITEV. Sure, but for my testing, there's never an EAGAIN, so I won't be exercising that path for the peak testing. -- Jens Axboe