On Tue, Jan 19, 2021 at 10:00 AM Christoph Hellwig <hch@xxxxxx> wrote: > > On Tue, Jan 19, 2021 at 09:53:36AM -0800, Marc Orr wrote: > > This patch ensures that when `nvme_map_data()` fails to map the > > addresses in a scatter/gather list: > > > > * The addresses are not incorrectly unmapped. The underlying > > scatter/gather code unmaps the addresses after detecting a failure. > > Thus, unmapping them again in the driver is a bug. > > * The DMA pool allocations are not deallocated when they were never > > allocated. > > > > The bug that motivated this patch was the following sequence, which > > occurred within the NVMe driver, with the kernel flag `swiotlb=force`. > > > > * NVMe driver calls dma_direct_map_sg() > > * dma_direct_map_sg() fails part way through the scatter gather/list > > * dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries > > succeeded. > > * NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a > > double unmap, which is a bug. > > > > Before this patch, I observed intermittent application- and VM-level > > failures when running a benchmark, fio, in an AMD SEV guest. This patch > > resolves the failures. > > I think the right way to fix this is to just do a proper unwind insted > of calling a catchall function. Can you try this patch? Done. It works great, thanks! Shall I send out a v2 with what you've proposed? > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 25456d02eddb8c..47d7075053b6b2 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -842,7 +842,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, > sg_init_table(iod->sg, blk_rq_nr_phys_segments(req)); > iod->nents = blk_rq_map_sg(req->q, req, iod->sg); > if (!iod->nents) > - goto out; > + goto out_free_sg; > > if (is_pci_p2pdma_page(sg_page(iod->sg))) > nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg, > @@ -851,16 +851,25 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, > nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, > rq_dma_dir(req), DMA_ATTR_NO_WARN); > if (!nr_mapped) > - goto out; > + goto out_free_sg; > > iod->use_sgl = nvme_pci_use_sgls(dev, req); > if (iod->use_sgl) > ret = nvme_pci_setup_sgls(dev, req, &cmnd->rw, nr_mapped); > else > ret = nvme_pci_setup_prps(dev, req, &cmnd->rw); > -out: > if (ret != BLK_STS_OK) > - nvme_unmap_data(dev, req); > + goto out_dma_unmap; > + return BLK_STS_OK; > + > +out_dma_unmap: > + if (is_pci_p2pdma_page(sg_page(iod->sg))) > + pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents, > + rq_dma_dir(req)); > + else > + dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req)); Do you think it's worth hoisting this sg unmap snippet into a helper that can be called from both here, as well as nvme_unmap_data()? > +out_free_sg: > + mempool_free(iod->sg, dev->iod_mempool); > return ret; > } >