Hi, With the support in 5.16-rc1 for allocating and completing batches of IO, the one missing piece is passing down a list of requests for issue. Drivers can take advantage of this by defining an mq_ops->queue_rqs() hook. This implements it for NVMe, allowing copy of multiple commands in one swoop. This is good for around a 500K IOPS/core improvement in my testing, which is around a 5-6% improvement in efficiency. No changes since v3 outside of a comment addition. Changes since v2: - Add comment on why shared tags are currently bypassed - Add reviewed-by's -- Jens Axboe