On Wed, Mar 14, 2018 at 10:17:34AM -0600, Logan Gunthorpe wrote: > On 13/03/18 08:56 PM, Bjorn Helgaas wrote: > > I agree that peers need to have a common upstream bridge. I think > > you're saying peers need to have *two* common upstream bridges. If I > > understand correctly, requiring two common bridges is a way to ensure > > that peers directly below Root Ports don't try to DMA to each other. > > No, I don't get where you think we need to have two common upstream > bridges. I'm not sure when such a case would ever happen. But you seem > to understand based on what you wrote below. Sorry, I phrased that wrong. You don't require two common upstream bridges; you require two upstream bridges, with the upper one being common, i.e., static struct pci_dev *get_upstream_bridge_port(struct pci_dev *pdev) { struct pci_dev *up1, *up2; up1 = pci_dev_get(pci_upstream_bridge(pdev)); up2 = pci_dev_get(pci_upstream_bridge(up1)); return up2; } So if you're starting with pdev, up1 is the immediately upstream bridge and up2 is the second upstream bridge. If this is PCIe, up1 may be a Root Port and there is no up2, or up1 and up2 are in a switch. This is more restrictive than the spec requires. As long as there is a single common upstream bridge, peer-to-peer DMA should work. In fact, in conventional PCI, I think the upstream bridge could even be the host bridge (not a PCI-to-PCI bridge). You are focused on PCIe systems, and in those systems, most topologies do have an upstream switch, which means two upstream bridges. I'm trying to remove that assumption because I don't think there's a requirement for it in the spec. Enforcing this assumption complicates the code and makes it harder to understand because the reader says "huh, I know peer-to-peer DMA should work inside any PCI hierarchy*, so why do we need these two bridges?" [*] For conventional PCI, this means anything below the same host bridge. Two devices on a conventional PCI root bus should be able to DMA to each other, even though there's no PCI-to-PCI bridge above them. For PCIe, it means a "hierarchy domain" as used in PCIe r4.0, sec 1.3.1, i.e., anything below the same Root Port. > > So I guess the first order of business is to nail down whether peers > > below a Root Port are prohibited from DMAing to each other. My > > assumption, based on 6.12.1.2 and the fact that I haven't yet found > > a prohibition, is that they can. > > If you have a multifunction device designed to DMA to itself below a > root port, it can. But determining this is on a device by device basis, > just as determining whether a root complex can do peer to peer is on a > per device basis. So I'd say we don't want to allow it by default and > let someone who has such a device figure out what's necessary if and > when one comes along. It's not the job of this infrastructure to answer the device-dependent question of whether DMA initiators or targets support peer-to-peer DMA. All we want to do here is figure out whether the PCI topology supports it, using the mechanisms guaranteed by the spec. We can derive that from the basic rules about how PCI bridges work, i.e., from the PCI-to-PCI Bridge spec r1.2, sec 4.3: A bridge forwards PCI memory transactions from its primary interface to its secondary interface (downstream) if a memory address is in the range defined by the Memory Base and Memory Limit registers (when the base is less than or equal to the limit) as illustrated in Figure 4-3. Conversely, a memory transaction on the secondary interface that is within this address range will not be forwarded upstream to the primary interface. Any memory transactions on the secondary interface that are outside this address range will be forwarded upstream to the primary interface (provided they are not in the address range defined by the prefetchable memory address range registers). This works for either PCI or PCIe. The only wrinkle PCIe adds is that the very top of the hierarchy is a Root Port, and we can't rely on it to route traffic to other Root Ports. I also doubt Root Complex Integrated Endpoints can participate in peer-to-peer DMA. Thanks for your patience in working through all this. I know it sometimes feels like being bounced around in all directions. It's just a normal consequence of trying to add complex functionality to an already complex system, with interest and expertise spread unevenly across a crowd of people. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html