On 10/13/21 3:09 AM, John Garry wrote: > On 12/10/2021 19:17, Jens Axboe wrote: >> Signed-off-by: Jens Axboe<axboe@xxxxxxxxx> >> --- >> drivers/nvme/host/pci.c | 69 +++++++++++++++++++++++++++++++++++++---- >> 1 file changed, 63 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c >> index 4ad63bb9f415..4713da708cd4 100644 >> --- a/drivers/nvme/host/pci.c >> +++ b/drivers/nvme/host/pci.c >> @@ -959,7 +959,7 @@ static blk_status_t nvme_queue_rq(struct blk_mq_hw_ctx *hctx, >> return ret; >> } >> >> -static void nvme_pci_complete_rq(struct request *req) >> +static void nvme_pci_unmap_rq(struct request *req) >> { >> struct nvme_iod *iod = blk_mq_rq_to_pdu(req); >> struct nvme_dev *dev = iod->nvmeq->dev; >> @@ -969,9 +969,34 @@ static void nvme_pci_complete_rq(struct request *req) >> rq_integrity_vec(req)->bv_len, rq_data_dir(req)); >> if (blk_rq_nr_phys_segments(req)) >> nvme_unmap_data(dev, req); >> +} >> + >> +static void nvme_pci_complete_rq(struct request *req) >> +{ >> + nvme_pci_unmap_rq(req); >> nvme_complete_rq(req); >> } >> >> +static void nvme_pci_complete_batch(struct io_batch *ib) >> +{ >> + struct request *req; >> + >> + req = ib->req_list; >> + while (req) { >> + nvme_pci_unmap_rq(req); > > This will do the DMA SG unmap per request. Often this is a performance > bottle neck when we have an IOMMU enabled in strict mode. So since we > complete in batches, could we combine all the SGs in the batch to do one > big DMA unmap SG, and not one-by-one? It is indeed, I actually have a patch for persistent maps as well. But even without that, it would make sense to handle these unmaps a bit smarter. That requires some iommu work though which I'm not that interested in right now, could be done on top of this one for someone motivated enough. -- Jens Axboe