On 2022-06-30 08:56, Robin Murphy wrote: > On 2022-06-29 23:41, Logan Gunthorpe wrote: >> >> >> On 2022-06-29 13:15, Robin Murphy wrote: >>> On 2022-06-29 16:57, Logan Gunthorpe wrote: >>>> >>>> >>>> >>>> On 2022-06-29 06:07, Robin Murphy wrote: >>>>> On 2022-06-15 17:12, Logan Gunthorpe wrote: >>>>>> When a PCI P2PDMA page is seen, set the IOVA length of the segment >>>>>> to zero so that it is not mapped into the IOVA. Then, in >>>>>> finalise_sg(), >>>>>> apply the appropriate bus address to the segment. The IOVA is not >>>>>> created if the scatterlist only consists of P2PDMA pages. >>>>>> >>>>>> A P2PDMA page may have three possible outcomes when being mapped: >>>>>> 1) If the data path between the two devices doesn't go through >>>>>> the root port, then it should be mapped with a PCI bus >>>>>> address >>>>>> 2) If the data path goes through the host bridge, it should be >>>>>> mapped >>>>>> normally with an IOMMU IOVA. >>>>>> 3) It is not possible for the two devices to communicate and >>>>>> thus >>>>>> the mapping operation should fail (and it will return >>>>>> -EREMOTEIO). >>>>>> >>>>>> Similar to dma-direct, the sg_dma_mark_pci_p2pdma() flag is used to >>>>>> indicate bus address segments. On unmap, P2PDMA segments are skipped >>>>>> over when determining the start and end IOVA addresses. >>>>>> >>>>>> With this change, the flags variable in the dma_map_ops is set to >>>>>> DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for P2PDMA pages. >>>>>> >>>>>> Signed-off-by: Logan Gunthorpe <logang@xxxxxxxxxxxx> >>>>>> Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx> >>>>>> --- >>>>>> drivers/iommu/dma-iommu.c | 68 >>>>>> +++++++++++++++++++++++++++++++++++---- >>>>>> 1 file changed, 61 insertions(+), 7 deletions(-) >>>>>> >>>>>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c >>>>>> index f90251572a5d..b01ca0c6a7ab 100644 >>>>>> --- a/drivers/iommu/dma-iommu.c >>>>>> +++ b/drivers/iommu/dma-iommu.c >>>>>> @@ -21,6 +21,7 @@ >>>>>> #include <linux/iova.h> >>>>>> #include <linux/irq.h> >>>>>> #include <linux/list_sort.h> >>>>>> +#include <linux/memremap.h> >>>>>> #include <linux/mm.h> >>>>>> #include <linux/mutex.h> >>>>>> #include <linux/pci.h> >>>>>> @@ -1062,6 +1063,16 @@ static int __finalise_sg(struct device *dev, >>>>>> struct scatterlist *sg, int nents, >>>>>> sg_dma_address(s) = DMA_MAPPING_ERROR; >>>>>> sg_dma_len(s) = 0; >>>>>> + if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) { >>>>> >>>>> Logically, should we not be able to use sg_is_dma_bus_address() >>>>> here? I >>>>> think it should be feasible, and simpler, to prepare the p2p segments >>>>> up-front, such that at this point all we need to do is restore the >>>>> original length (if even that, see below). >>>> >>>> Per my previous email, no, because sg_is_dma_bus_address() is not set >>>> yet and not meant to tell you something about the page. That flag will >>>> be set below by pci_p2pdma_map_bus_segment() and then checkd in >>>> iommu_dma_unmap_sg() to determine if the dma_address in the segment >>>> needs to be unmapped. >>> >>> I know it's not set yet as-is; I'm suggesting things should be >>> restructured so that it *would be*. In the logical design of this code, >>> the DMA addresses are effectively determined in iommu_dma_map_sg(), and >>> __finalise_sg() merely converts them from a relative to an absolute form >>> (along with undoing the other trickery). Thus the call to >>> pci_p2pdma_map_bus_segment() absolutely belongs in the main >>> iommu_map_sg() loop. >> >> I don't see how that can work: __finalise_sg() does more than convert >> them from relative to absolute, it also figures out which SG entry will >> contain which dma_address segment. Which segment a P2PDMA address needs >> to be programmed into depends on the how 'cur' is calculated which in >> turn depends on things like seg_mask and max_len. This calculation is >> not done in iommu_dma_map_sg() so I don't see how there's any hope of >> assigning the bus address for the P2P segments in that function. >> >> If there's a way to restructure things so that's possible that I'm not >> seeing, I'm open to it but it's certainly not immediately obvious. > > Huh? It's still virtually the same thing; iommu_dma_map_sg() calls > pci_p2pdma_map_bus_segment(s) and sets s->length to 0 if > PCI_P2PDMA_MAP_BUS_ADDR, then __finalise_sg() can use > sg_is_dma_bus_address(s) in place of is_pci_p2pdma_page(sg_page(s)), and > just propagate the DMA address and original length from s to cur. > > Here you've written a patch which looks to correctly interrupt any > ongoing concatenation state and convey some data from the given input > segment to the appropriate output segment, so I'm baffled by why you'd > think you couldn't do what you've already done. Ah, I understand now, thanks for the patience. It took me a couple of read throughs before I got it, but I figured it out and now have a working implementation that looks really nice. It's a big improvement not needing the two different P2PDMA helpers. I'll send a v8 of just the first 13 patches next week after a bit more testing. I've put a draft git branch here if you want to look at it before that: https://github.com/sbates130272/linux-p2pmem p2pdma_map_v8 Thanks! Logan