On Wed, Sep 07, 2022 at 01:12:52PM -0300, Jason Gunthorpe wrote: > The PCI offset is some embedded thing - I've never seen it in a server > platform. That's not actually true, e.g. some power system definitively had it, althiugh I don't know if the current ones do. But that's not that point. The offset is a configuration fully supported by Linux, and someone that just works by using the proper APIs. Doing some handwaiving about embedded only or bad design doesn't matter. There is a reason why we have these proper APIs and no one has any business bypassing them. > I also seem to remember that iommu and PCI offset don't play nice > together - so for the VFIO use case where the iommu is present I'm > pretty sure we can very safely assume 0 offset. That seems confirmed > by the fact that VFIO has never handled PCI offset in its own P2P path > and P2P works fine in VMs across a wide range of platforms. I think the offset is one of the reasons why IOVA windows can be reserved (and maybe also why ppc is so weird). > So, would you be OK with this series if I try to make a dma_map_p2p() > that resolves the offset issue? Well, if it also solves the other issue of invalid scatterlists leaking outside of drm we can think about it. > > > Last but not least I don't really see how the code would even work > > when an IOMMU is used, as dma_map_resource will return an IOVA that > > is only understood by the IOMMU itself, and not the other endpoint. > > I don't understand this. > > __iommu_dma_map() will put the given phys into the iommu_domain > associated with 'dev' and return the IOVA it picked. Yes, __iommu_dma_map creates an IOVA for the mapped remote BAR. That is the right thing if the I/O goes through the host bridge, but it is the wrong thing if the I/O goes through the switch - in that case the IOVA generated is not something that the endpoint that owns the BAR can even understand. Take a look at iommu_dma_map_sg and pci_p2pdma_map_segment to see how this is handled.