Hi, We now do decent batching of allocations for submit, but we still complete requests individually. This costs a lot of CPU cycles. This patchset adds support for collecting requests for completion, and then completing them as a batch. This includes things like freeing a batch of tags. This version is looking pretty good to me now, and should be ready for 5.16. Changes since v2: - Get rid of dev_id - Get rid of mq_ops->complete_batch - Drop now unnecessary ib->complete setting in blk_poll() - Drop one sbitmap patch that was questionnable - Rename io_batch to io_comp_batch - Track need_timestamp on per-iob basis instead of for each request - Drop elevator support for batching, cleaner without - Make the batched driver addition simpler - Unify nvme polled/irq handling - Drop io_uring file checking, no longer neededd - Cleanup io_uring completion side -- Jens Axboe