On Fri, Feb 7, 2025 at 1:39 AM Jens Axboe <axboe@xxxxxxxxx> wrote: > > While I'm always interested in making per-core IOPS better as it relates > to better efficiency in the IO stack, and have done a LOT of work in > this area in the past, for this particular case it's also worth > highlighting that I bet you could get a lot better performance by doing > something smarter with polling multiple devices than what t/io_uring is > currently doing - completing 32 requests on each device before moving on > to the other one is probably not the best approach. t/io_uring is simply > not designed very well for that. > > IOW, I do like this topic, but I think it'd be worthwhile to generate > some better numbers with a more targeted approach to polling multiple > devices from a single thread first rather than take t/io_uring in its > current form as gospel on that front. > Agreed. t/io_uring can be the starting point. At present, I see for a single thread, half of the queue depth is occupied by one device followed by second device. I tried to change the order to interleave devices, but overall I see a no gain in performance, there was little drop in performance depending on type of interleaving, from 5M IOPS to 3.8~4.6 M IOPS. With respect to polling multiple devices, do you have some other scheme in mind? Thank you, Nitesh Shetty