Re: [PATCH 9/9] nvme: wire up completion batching for the IRQ path

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Wed, 13 Oct 2021 08:12:25 +0100



On Tue, Oct 12, 2021 at 12:17:42PM -0600, Jens Axboe wrote:
> Trivial to do now, just need our own io_batch on the stack and pass that
> in to the usual command completion handling.
> 
> I pondered making this dependent on how many entries we had to process,
> but even for a single entry there's no discernable difference in
> performance or latency. Running a sync workload over io_uring:
> 
> t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1
> 
> yields the below performance before the patch:
> 
> IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
> IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
> IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
> 
> and the following after:
> 
> IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
> IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1)
> IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
> 
> which definitely isn't slower, about the same if you factor in a bit of
> variance. For peak performance workloads, benchmarking shows a 2%
> improvement.
> 
> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> ---
>  drivers/nvme/host/pci.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 4713da708cd4..fb3de6f68eb1 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1076,8 +1076,10 @@ static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
>  
>  static inline int nvme_process_cq(struct nvme_queue *nvmeq)
>  {
> +	struct io_batch ib;
>  	int found = 0;
>  
> +	ib.req_list = NULL;

Is this really more efficient than

	struct io_batch ib = { };

?