On 25/07/2020 22:40, Jens Axboe wrote: > On 7/25/20 12:24 PM, Pavel Begunkov wrote: >> On 25/07/2020 18:45, Jens Axboe wrote: >>> On 7/25/20 2:31 AM, Pavel Begunkov wrote: >>>> That's not final for a several reasons, but good enough for discussion. >>>> That brings io_kiocb down to 192B. I didn't try to benchmark it >>>> properly, but quick nop test gave +5% throughput increase. >>>> 7531 vs 7910 KIOPS with fio/t/io_uring >>>> >>>> The whole situation is obviously a bunch of tradeoffs. For instance, >>>> instead of shrinking it, we can inline apoll to speed apoll path. >>>> >>>> [2/2] just for a reference, I'm thinking about other ways to shrink it. >>>> e.g. ->link_list can be a single-linked list with linked tiemouts >>>> storing a back-reference. This can turn out to be better, because >>>> that would move ->fixed_file_refs to the 2nd cacheline, so we won't >>>> ever touch 3rd cacheline in the submission path. >>>> Any other ideas? >>> >>> Nothing noticeable for me, still about the same performance. But >>> generally speaking, I don't necessarily think we need to go all in on >>> making this as tiny as possible. It's much more important to chase the >>> items where we only use 2 cachelines for the hot path, and then we have >>> the extra space in there already for the semi hot paths like poll driven >>> retry. Yes, we're still allocating from a pool that has slightly larger >>> objects, but that doesn't really matter _that_ much. Avoiding an extra >>> kmalloc+kfree for the semi hot paths are a bigger deal than making >>> io_kiocb smaller and smaller. >>> >>> That said, for no-brainer changes, we absolutely should make it smaller. >>> I just don't want to jump through convoluted hoops to get there. >> >> Agree, but that's not the end goal. The first point is to kill the union, >> but it already has enough space for that. > > Right > >> The second is to see, whether we can use the space in a better way. From >> the high level perspective ->apoll and ->work are alike and both serve to >> provide asynchronous paths, hence the idea to swap them naturally comes to >> mind. > > Totally agree, which is why the union of those kind of makes sense. > We're definitely NOT using them at the same time, but the fact that we > had various mm/creds/whatnot in the work_struct made that a bit iffy. Thinking of it, if combined with work de-init as you proposed before, it's probably possible to make a layout similar to the one below struct io_kiocb { ... struct hlist_node hash_node; struct callback_head task_work; union { struct io_wq_work work; struct async_poll apoll; }; }; Saves ->apoll kmalloc(), and the actual work de-init would be negligibly rare. Worth to try > >> TBH, I don't think it'd do much, because init of ->io would probably >> hide any benefit. > > There should be no ->io init/alloc for this test case. I mean, before getting into io_arm_poll_handler(), it should get -EAGAIN in io_{read,write}() and initialise ->io in io_setup_async_rw(), at least for READV, WRITEV. -- Pavel Begunkov