On 1/17/24 09:48, Jens Axboe wrote:
When posting this patch series, please include performance results
(IOPS) for a zoned null_blk device instance. mq-deadline doesn't support
more than 200 K IOPS, which is less than what UFS devices support. I
hope that this performance bottleneck will be solved with the new
approach.
Not really zone related, but I was very aware of the single lock
limitations when I ported deadline to blk-mq. Was always hoping that
someone would actually take the time to make it more efficient, but so
far that hasn't happened. Or maybe it'll be a case of "just do it
yourself, Jens" at some point...
Hi Jens,
I think it is something fundamental rather than something that can be
fixed. The I/O scheduling algorithms in mq-deadline and BFQ require
knowledge of all pending I/O requests. This implies that data structures
must be maintained that are shared across all CPU cores. Making these
thread-safe implies having synchronization mechanisms that are used
across all CPU cores. I think this is where the (about) 200 K IOPS
bottleneck comes from.
Additionally, the faster storage devices become, the larger the relative
overhead of an I/O scheduler is (assuming that I/O schedulers won't
become significantly faster).
A fundamental limitation of I/O schedulers is that multiple commands
must be submitted simultaneously to the storage device to achieve good
performance. However, if the queue depth is larger than one then the
device has some control over the order in which commands are executed.
Because of all the above reasons I'm recommending my colleagues to move
I/O prioritization into the storage device and to evolve towards a
future for solid storage devices without I/O schedulers. I/O schedulers
probably will remain important for rotating magnetic media.
Thank you,
Bart.