Hi, With the support in 5.16-rc1 for allocating and completing batches of IO, the one missing piece is passing down a list of requests for issue. Drivers can take advantage of this by defining an mq_ops->queue_rqs() hook. This implements it for NVMe, allowing copy of multiple commands in one swoop. This is good for around a 500K IOPS/core improvement in my testing, which is around a 5-6% improvement in efficiency. Changes since v1: - Addressed review comments - Rebase on top of Ming's hctx lock change - Clean ups - Bypass for shared tags -- Jens Axboe