Re: [LSF/MM/BPF TOPIC] Block IO performance per core?

Nitesh Shetty <nitheshshetty@xxxxxxxxx> · Tue, 11 Feb 2025 22:20:51 +0530



On Fri, Feb 7, 2025 at 1:39 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> While I'm always interested in making per-core IOPS better as it relates
> to better efficiency in the IO stack, and have done a LOT of work in
> this area in the past, for this particular case it's also worth
> highlighting that I bet you could get a lot better performance by doing
> something smarter with polling multiple devices than what t/io_uring is
> currently doing - completing 32 requests on each device before moving on
> to the other one is probably not the best approach. t/io_uring is simply
> not designed very well for that.
>
> IOW, I do like this topic, but I think it'd be worthwhile to generate
> some better numbers with a more targeted approach to polling multiple
> devices from a single thread first rather than take t/io_uring in its
> current form as gospel on that front.
>
Agreed. t/io_uring can be the starting point.
At present, I see for a single thread, half of the queue depth is
occupied by one device followed by second device.
I tried to change the order to interleave devices,
but overall I see a no gain in performance, there was little drop in
performance depending on type of interleaving, from 5M IOPS to
3.8~4.6 M IOPS. With respect to polling multiple devices,
do you have some other scheme in mind?

Thank you,
Nitesh Shetty