Re: [PATCH 6/9] nvme: add support for batched completion of polled IO

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Wed, 13 Oct 2021 08:08:55 +0100

On Tue, Oct 12, 2021 at 12:17:39PM -0600, Jens Axboe wrote:
> Take advantage of struct io_batch, if passed in to the nvme poll handler.
> If it's set, rather than complete each request individually inline, store
> them in the io_batch list. We only do so for requests that will complete
> successfully, anything else will be completed inline as before.
> 
> Add an mq_ops->complete_batch() handler to do the post-processing of
> the io_batch list once polling is complete.
> 
> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
> ---
>  drivers/nvme/host/pci.c | 69 +++++++++++++++++++++++++++++++++++++----
>  1 file changed, 63 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 4ad63bb9f415..4713da708cd4 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -959,7 +959,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	return ret;
>  }
>  
> -static void nvme_pci_complete_rq(struct request *req)
> +static void nvme_pci_unmap_rq(struct request *req)
>  {
>  	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>  	struct nvme_dev *dev = iod->nvmeq->dev;
> @@ -969,9 +969,34 @@ static void nvme_pci_complete_rq(struct request *req)
>  			       rq_integrity_vec(req)->bv_len, rq_data_dir(req));
>  	if (blk_rq_nr_phys_segments(req))
>  		nvme_unmap_data(dev, req);
> +}
> +
> +static void nvme_pci_complete_rq(struct request *req)
> +{
> +	nvme_pci_unmap_rq(req);
>  	nvme_complete_rq(req);
>  }
>  
> +static void nvme_pci_complete_batch(struct io_batch *ib)
> +{
> +	struct request *req;
> +
> +	req = ib->req_list;
> +	while (req) {
> +		nvme_pci_unmap_rq(req);
> +		if (req->rq_flags & RQF_SPECIAL_PAYLOAD)
> +			nvme_cleanup_cmd(req);
> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
> +				req_op(req) == REQ_OP_ZONE_APPEND)
> +			req->__sector = nvme_lba_to_sect(req->q->queuedata,
> +					le64_to_cpu(nvme_req(req)->result.u64));
> +		req->status = nvme_error_status(nvme_req(req)->status);
> +		req = req->rq_next;
> +	}
> +
> +	blk_mq_end_request_batch(ib);

I hate all this open coding.  All the common logic needs to be in a
common helper.  Also please add a for_each macro for the request list
iteration.  I already thought about that for the batch allocation in
for-next, but with ever more callers this becomes essential.

> +	if (!nvme_try_complete_req(req, cqe->status, cqe->result)) {
> +		enum nvme_disposition ret;
> +
> +		ret = nvme_decide_disposition(req);
> +		if (unlikely(!ib || req->end_io || ret != COMPLETE)) {
> +			nvme_pci_complete_rq(req);

This actually is the likely case as only polling ever does the batch
completion.  In doubt I'd prefer if we can avoid these likely/unlikely
annotations as much as possible.

> +		} else {
> +			req->rq_next = ib->req_list;
> +			ib->req_list = req;
> +		}

And all this list manipulation should use proper helper.

> +	}

Also - can you look into turning this logic into an inline function with
a callback for the driver?  I think in general gcc will avoid the
indirect call for function pointers passed directly.  That way we can
keep a nice code structure but also avoid the indirections.

Same for nvme_pci_complete_batch.