On 2021-10-27 5:11 p.m., Bjorn Helgaas wrote: >> @@ -532,6 +577,9 @@ calc_map_type_and_dist(struct pci_dev *provider, struct pci_dev *client, >> map_type = PCI_P2PDMA_MAP_NOT_SUPPORTED; >> } >> done: >> + if (pci_10bit_tags_unsupported(client, provider, verbose)) >> + map_type = PCI_P2PDMA_MAP_NOT_SUPPORTED; > > I need to be convinced that this check is in the right spot to catch > all potential P2PDMA situations. The pci_p2pmem_find() and > pci_p2pdma_distance() interfaces eventually call > calc_map_type_and_dist(). But those interfaces don't actually produce > DMA bus addresses, and I'm not convinced that all P2PDMA users use > them. > > nvme *does* use them, but infiniband (rdma_rw_map_sg()) does not, and > it calls pci_p2pdma_map_sg(). The rules of the current code is that calc_map_type_and_dist() must be called before pci_p2pdma_map_sg(). The calc function caches the mapping type in an xarray. If it was not called ahead of time, pci_p2pdma_map_type() will return PCI_P2PDMA_MAP_NOT_SUPPORTED, and the WARN_ON_ONCE will be hit in pci_p2pdma_map_sg_attrs(). Both NVMe and RDMA (only used in the nvme fabrics code) do the correct thing here and we can be sure calc_map_type_and_dist() is called before any pages are mapped. The patch set I'm currently working on will ensure that calc_map_type_and_dist() is called before anyone maps a PCI P2PDMA page with dma_map_sg*(). > amdgpu_dma_buf_attach() calls pci_p2pdma_distance_many() but I don't > know where it sets up P2PDMA transactions. The amdgpu driver hacked this in before proper support was done, but at least it's using pci_p2pdma_distance_many() presumably before trying any transfer. Though it's likely broken as it doesn't take into account the mapping type and thus I think it always assumes traffic goes through the host bridge (seeing it doesn't use pci_p2pdma_map_sg()). > cxgb4 and qed mention "peer2peer", but I don't know whether they are > related; they don't seem to use any pci_p2p.* interfaces. I'm really not sure what these drivers are doing at all. However, I think this is unrelated based on this old patch description[1]: Open MPI, Intel MPI and other applications don't support the iWARP requirement that the client side send the first RDMA message. This class of application connection setup is called peer-2-peer. Typically once the connection is setup, _both_ sides want to send data. This patch enables supporting peer-2-peer over the chelsio rnic by enforcing this iWARP requirement in the driver itself as part of RDMA connection setup. Logan [1] http://lkml.iu.edu/hypermail/linux/kernel/0804.3/1416.html