Re: [PATCH 6/9] nvme: add support for batched completion of polled IO

Jens Axboe <axboe@xxxxxxxxx> · Wed, 13 Oct 2021 09:10:01 -0600

On 10/13/21 1:08 AM, Christoph Hellwig wrote:
> On Tue, Oct 12, 2021 at 12:17:39PM -0600, Jens Axboe wrote:
>> Take advantage of struct io_batch, if passed in to the nvme poll handler.
>> If it's set, rather than complete each request individually inline, store
>> them in the io_batch list. We only do so for requests that will complete
>> successfully, anything else will be completed inline as before.
>>
>> Add an mq_ops->complete_batch() handler to do the post-processing of
>> the io_batch list once polling is complete.
>>
>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>> ---
>>  drivers/nvme/host/pci.c | 69 +++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 63 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 4ad63bb9f415..4713da708cd4 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -959,7 +959,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>>  	return ret;
>>  }
>>  
>> -static void nvme_pci_complete_rq(struct request *req)
>> +static void nvme_pci_unmap_rq(struct request *req)
>>  {
>>  	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>>  	struct nvme_dev *dev = iod->nvmeq->dev;
>> @@ -969,9 +969,34 @@ static void nvme_pci_complete_rq(struct request *req)
>>  			       rq_integrity_vec(req)->bv_len, rq_data_dir(req));
>>  	if (blk_rq_nr_phys_segments(req))
>>  		nvme_unmap_data(dev, req);
>> +}
>> +
>> +static void nvme_pci_complete_rq(struct request *req)
>> +{
>> +	nvme_pci_unmap_rq(req);
>>  	nvme_complete_rq(req);
>>  }
>>  
>> +static void nvme_pci_complete_batch(struct io_batch *ib)
>> +{
>> +	struct request *req;
>> +
>> +	req = ib->req_list;
>> +	while (req) {
>> +		nvme_pci_unmap_rq(req);
>> +		if (req->rq_flags & RQF_SPECIAL_PAYLOAD)
>> +			nvme_cleanup_cmd(req);
>> +		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
>> +				req_op(req) == REQ_OP_ZONE_APPEND)
>> +			req->__sector = nvme_lba_to_sect(req->q->queuedata,
>> +					le64_to_cpu(nvme_req(req)->result.u64));
>> +		req->status = nvme_error_status(nvme_req(req)->status);
>> +		req = req->rq_next;
>> +	}
>> +
>> +	blk_mq_end_request_batch(ib);
> 
> I hate all this open coding.  All the common logic needs to be in a
> common helper.

I'll see if I can unify this a bit.

> Also please add a for_each macro for the request list
> iteration.  I already thought about that for the batch allocation in
> for-next, but with ever more callers this becomes essential.

Added a prep patch with list helpers, current version is using those
now.

>> +	if (!nvme_try_complete_req(req, cqe->status, cqe->result)) {
>> +		enum nvme_disposition ret;
>> +
>> +		ret = nvme_decide_disposition(req);
>> +		if (unlikely(!ib || req->end_io || ret != COMPLETE)) {
>> +			nvme_pci_complete_rq(req);
> 
> This actually is the likely case as only polling ever does the batch
> completion.  In doubt I'd prefer if we can avoid these likely/unlikely
> annotations as much as possible.

If you look at the end of the series, IRQ is wired up for it too. But I
do agree with the unlikely, I generally dislike them too. I'll just kill
this one.

>> +		} else {
>> +			req->rq_next = ib->req_list;
>> +			ib->req_list = req;
>> +		}
> 
> And all this list manipulation should use proper helper.

Done

>> +	}
> 
> Also - can you look into turning this logic into an inline function with
> a callback for the driver?  I think in general gcc will avoid the
> indirect call for function pointers passed directly.  That way we can
> keep a nice code structure but also avoid the indirections.
> 
> Same for nvme_pci_complete_batch.

Not sure I follow. It's hard to do a generic callback for this, as the
batch can live outside the block layer through the plug. That's why
it's passed the way it is in terms of completion hooks.

-- 
Jens Axboe