[PATCH 9/9] nvme: wire up completion batching for the IRQ path

Jens Axboe <axboe@xxxxxxxxx> · Wed, 13 Oct 2021 10:54:16 -0600

Trivial to do now, just need our own io_batch on the stack and pass that
in to the usual command completion handling.

I pondered making this dependent on how many entries we had to process,
but even for a single entry there's no discernable difference in
performance or latency. Running a sync workload over io_uring:

t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1

yields the below performance before the patch:

IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)

and the following after:

IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)

which definitely isn't slower, about the same if you factor in a bit of
variance. For peak performance workloads, benchmarking shows a 2%
improvement.

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
 drivers/nvme/host/pci.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ae253f6f5c80..061f0b1cb0ec 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1061,6 +1061,7 @@ static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
 
 static inline int nvme_process_cq(struct nvme_queue *nvmeq)
 {
+	DEFINE_IO_BATCH(iob);
 	int found = 0;
 
 	while (nvme_cqe_pending(nvmeq)) {
@@ -1070,12 +1071,15 @@ static inline int nvme_process_cq(struct nvme_queue *nvmeq)
 		 * the cqe requires a full read memory barrier
 		 */
 		dma_rmb();
-		nvme_handle_cqe(nvmeq, NULL, nvmeq->cq_head);
+		nvme_handle_cqe(nvmeq, &iob, nvmeq->cq_head);
 		nvme_update_cq_head(nvmeq);
 	}
 
-	if (found)
+	if (found) {
+		if (iob.req_list)
+			nvme_pci_complete_batch(&iob);
 		nvme_ring_cq_doorbell(nvmeq);
+	}
 	return found;
 }
 
-- 
2.33.0