On 25/07/2020 18:45, Jens Axboe wrote: > On 7/25/20 2:31 AM, Pavel Begunkov wrote: >> That's not final for a several reasons, but good enough for discussion. >> That brings io_kiocb down to 192B. I didn't try to benchmark it >> properly, but quick nop test gave +5% throughput increase. >> 7531 vs 7910 KIOPS with fio/t/io_uring >> >> The whole situation is obviously a bunch of tradeoffs. For instance, >> instead of shrinking it, we can inline apoll to speed apoll path. >> >> [2/2] just for a reference, I'm thinking about other ways to shrink it. >> e.g. ->link_list can be a single-linked list with linked tiemouts >> storing a back-reference. This can turn out to be better, because >> that would move ->fixed_file_refs to the 2nd cacheline, so we won't >> ever touch 3rd cacheline in the submission path. >> Any other ideas? > > Nothing noticeable for me, still about the same performance. But > generally speaking, I don't necessarily think we need to go all in on > making this as tiny as possible. It's much more important to chase the > items where we only use 2 cachelines for the hot path, and then we have > the extra space in there already for the semi hot paths like poll driven > retry. Yes, we're still allocating from a pool that has slightly larger > objects, but that doesn't really matter _that_ much. Avoiding an extra > kmalloc+kfree for the semi hot paths are a bigger deal than making > io_kiocb smaller and smaller. > > That said, for no-brainer changes, we absolutely should make it smaller. > I just don't want to jump through convoluted hoops to get there. Agree, but that's not the end goal. The first point is to kill the union, but it already has enough space for that. The second is to see, whether we can use the space in a better way. From the high level perspective ->apoll and ->work are alike and both serve to provide asynchronous paths, hence the idea to swap them naturally comes to mind. TBH, I don't think it'd do much, because init of ->io would probably hide any benefit. -- Pavel Begunkov