On 2022-06-29 06:07, Robin Murphy wrote: > On 2022-06-15 17:12, Logan Gunthorpe wrote: >> When a PCI P2PDMA page is seen, set the IOVA length of the segment >> to zero so that it is not mapped into the IOVA. Then, in finalise_sg(), >> apply the appropriate bus address to the segment. The IOVA is not >> created if the scatterlist only consists of P2PDMA pages. >> >> A P2PDMA page may have three possible outcomes when being mapped: >> 1) If the data path between the two devices doesn't go through >> the root port, then it should be mapped with a PCI bus address >> 2) If the data path goes through the host bridge, it should be mapped >> normally with an IOMMU IOVA. >> 3) It is not possible for the two devices to communicate and thus >> the mapping operation should fail (and it will return -EREMOTEIO). >> >> Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to >> indicate bus address segments. On unmap, P2PDMA segments are skipped >> over when determining the start and end IOVA addresses. >> >> With this change, the flags variable in the dma_map_ops is set to >> DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages. >> >> Signed-off-by: Logan Gunthorpe <logang@xxxxxxxxxxxx> >> Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx> >> --- >> drivers/iommu/dma-iommu.c | 68 +++++++++++++++++++++++++++++++++++---- >> 1 file changed, 61 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c >> index f90251572a5d..b01ca0c6a7ab 100644 >> --- a/drivers/iommu/dma-iommu.c >> +++ b/drivers/iommu/dma-iommu.c >> @@ -21,6 +21,7 @@ >> #include <linux/iova.h> >> #include <linux/irq.h> >> #include <linux/list_sort.h> >> +#include <linux/memremap.h> >> #include <linux/mm.h> >> #include <linux/mutex.h> >> #include <linux/pci.h> >> @@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev, >> struct scatterlist *sg, int nents, >> sg_dma_address(s) = DMA_MAPPING_ERROR; >> sg_dma_len(s) = 0; >> + if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) { > > Logically, should we not be able to use sg_is_dma_bus_address() here? I > think it should be feasible, and simpler, to prepare the p2p segments > up-front, such that at this point all we need to do is restore the > original length (if even that, see below). Per my previous email, no, because sg_is_dma_bus_address() is not set yet and not meant to tell you something about the page. That flag will be set below by pci_p2pdma_map_bus_segment() and then checkd in iommu_dma_unmap_sg() to determine if the dma_address in the segment needs to be unmapped. > >> + if (i > 0) >> + cur = sg_next(cur); >> + >> + pci_p2pdma_map_bus_segment(s, cur); >> + count++; >> + cur_len = 0; >> + continue; >> + } >> + >> /* >> * Now fill in the real DMA data. If... >> * - there is a valid output segment to append to >> @@ -1158,6 +1169,8 @@ static int iommu_dma_map_sg(struct device *dev, >> struct scatterlist *sg, >> struct iova_domain *iovad = &cookie->iovad; >> struct scatterlist *s, *prev = NULL; >> int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs); >> + struct dev_pagemap *pgmap = NULL; >> + enum pci_p2pdma_map_type map_type; >> dma_addr_t iova; >> size_t iova_len = 0; >> unsigned long mask = dma_get_seg_boundary(dev); >> @@ -1193,6 +1206,35 @@ static int iommu_dma_map_sg(struct device *dev, >> struct scatterlist *sg, >> s_length = iova_align(iovad, s_length + s_iova_off); >> s->length = s_length; >> + if (is_pci_p2pdma_page(sg_page(s))) { >> + if (sg_page(s)->pgmap != pgmap) { >> + pgmap = sg_page(s)->pgmap; >> + map_type = pci_p2pdma_map_type(pgmap, dev); >> + } > > There's a definite code smell here, but per above and below I think we > *should* actually call the new helper instead of copy-pasting half of it. > >> + >> + switch (map_type) { >> + case PCI_P2PDMA_MAP_BUS_ADDR: >> + /* >> + * A zero length will be ignored by >> + * iommu_map_sg() and then can be detected > > If that is required behaviour then it needs an explicit check in > iommu_map_sg() to guarantee (and document) it. It's only by chance that > __iommu_map() happens to return success for size == 0 *if* all the other > arguments still line up, which is a far cry from a safe no-op. What should such a check look like? I could certainly add some comments to iommu_map_sg(), but I don't see what the code would need to check for... > However, rather than play yet more silly tricks, I think it would make > even more sense to make iommu_map_sg() properly aware and able to skip > direct p2p segments on its own. Once it becomes normal to pass mixed > scatterlists around, it's only a matter of time until one ends up being > handed to a driver which manages its own IOMMU domain, and then what? I suppose we can add another call to is_pci_p2pdma_page() inside iommu_map_sg() if you think that is cleaner. Seems like more work on the fast path to me, but I'm not opposed to it. >> + * in __finalise_sg() to actually map the >> + * bus address. >> + */ >> + s->length = 0; >> + continue; >> + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: >> + /* >> + * Mapping through host bridge should be >> + * mapped with regular IOVAs, thus we >> + * do nothing here and continue below. >> + */ >> + break; >> + default: >> + ret = -EREMOTEIO; >> + goto out_restore_sg; >> + } >> + } >> + >> /* >> * Due to the alignment of our single IOVA allocation, we can >> * depend on these assumptions about the segment boundary mask: >> @@ -1215,6 +1257,9 @@ static int iommu_dma_map_sg(struct device *dev, >> struct scatterlist *sg, >> prev = s; >> } >> + if (!iova_len) >> + return __finalise_sg(dev, sg, nents, 0); >> + >> iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), >> dev); >> if (!iova) { >> ret = -ENOMEM; >> @@ -1236,7 +1281,7 @@ static int iommu_dma_map_sg(struct device *dev, >> struct scatterlist *sg, >> out_restore_sg: >> __invalidate_sg(sg, nents); >> out: >> - if (ret != -ENOMEM) >> + if (ret != -ENOMEM && ret != -EREMOTEIO) >> return -EINVAL; >> return ret; >> } >> @@ -1244,7 +1289,7 @@ static int iommu_dma_map_sg(struct device *dev, >> struct scatterlist *sg, >> static void iommu_dma_unmap_sg(struct device *dev, struct >> scatterlist *sg, >> int nents, enum dma_data_direction dir, unsigned long attrs) >> { >> - dma_addr_t start, end; >> + dma_addr_t end, start = DMA_MAPPING_ERROR; > > There are several things I don't like about this logic, I'd rather have > "end = 0" here... Ok, I think that should work. >> struct scatterlist *tmp; >> int i; >> @@ -1260,14 +1305,22 @@ static void iommu_dma_unmap_sg(struct device >> *dev, struct scatterlist *sg, >> * The scatterlist segments are mapped into a single >> * contiguous IOVA allocation, so this is incredibly easy. >> */ > > [ This comment rather stops being true :( ] Not exactly. Sure there are some segments in the SGL that have bus addresses, but all the regular IOVAs still have a single contiguous allocation and only require one call to __iommu_dma_unmap(). The only trick issues is finding the first and last actual IOVA SG to get the range. > >> - start = sg_dma_address(sg); >> - for_each_sg(sg_next(sg), tmp, nents - 1, i) { > > ...then generalise the first-element special case here into a dedicated > "walk to the first non-p2p element" loop... Ok, I'll see what I can do for that. Logan