Re: [RFC 0/2] 3 cacheline io_kiocb

Jens Axboe <axboe@xxxxxxxxx> · Sat, 25 Jul 2020 09:45:48 -0600

On 7/25/20 2:31 AM, Pavel Begunkov wrote:
> That's not final for a several reasons, but good enough for discussion.
> That brings io_kiocb down to 192B. I didn't try to benchmark it
> properly, but quick nop test gave +5% throughput increase.
> 7531 vs 7910 KIOPS with fio/t/io_uring
> 
> The whole situation is obviously a bunch of tradeoffs. For instance,
> instead of shrinking it, we can inline apoll to speed apoll path.
> 
> [2/2] just for a reference, I'm thinking about other ways to shrink it.
> e.g. ->link_list can be a single-linked list with linked tiemouts
> storing a back-reference. This can turn out to be better, because
> that would move ->fixed_file_refs to the 2nd cacheline, so we won't
> ever touch 3rd cacheline in the submission path.
> Any other ideas?

Nothing noticeable for me, still about the same performance. But
generally speaking, I don't necessarily think we need to go all in on
making this as tiny as possible. It's much more important to chase the
items where we only use 2 cachelines for the hot path, and then we have
the extra space in there already for the semi hot paths like poll driven
retry. Yes, we're still allocating from a pool that has slightly larger
objects, but that doesn't really matter _that_ much. Avoiding an extra
kmalloc+kfree for the semi hot paths are a bigger deal than making
io_kiocb smaller and smaller.

That said, for no-brainer changes, we absolutely should make it smaller.
I just don't want to jump through convoluted hoops to get there.

-- 
Jens Axboe