On Fri, Mar 23, 2018 at 03:59:14PM -0600, Logan Gunthorpe wrote: > On 23/03/18 03:50 PM, Bjorn Helgaas wrote: > > Popping way up the stack, my original point was that I'm trying to > > remove restrictions on what devices can participate in > > peer-to-peer DMA. I think it's fairly clear that in conventional > > PCI, any devices in the same PCI hierarchy, i.e., below the same > > host-to-PCI bridge, should be able to DMA to each other. > > Yup, we are working on this. > > > The routing behavior of PCIe is supposed to be compatible with > > conventional PCI, and I would argue that this effectively requires > > multi-function PCIe devices to have the internal routing required > > to avoid the route-to-self issue. > > That would be very nice but many devices do not support the internal > route. We've had to work around this in the past and as I mentioned > earlier that NVMe devices have a flag indicating support. However, > if a device wants to be involved in P2P it must support it and we > can exclude devices that don't support it by simply not enabling > their drivers. Do you think these devices that don't support internal DMA between functions are within spec, or should we handle them as exceptions, e.g., via quirks? If NVMe defines a flag indicating peer-to-peer support, that would suggest to me that these devices are within spec. I looked up the CMBSZ register you mentioned (NVMe 1.3a, sec 3.1.12). You must be referring to the WDS, RDS, LISTS, CQS, and SQS bits. If WDS is set, the controller supports having Write-related data and metadata in the Controller Memory Buffer. That would mean the driver could put certain queues in controller memory instead of in host memory. The controller could then read the queue from its own internal memory rather than issuing a PCIe transaction to read it from host memory. That makes sense to me, but I don't see the connection to peer-to-peer. There's no multi-function device in this picture, so it's not about internal DMA between functions. WDS, etc., tell us about capabilities of the controller. If WDS is set, the CPU (or a peer PCIe device) can write things to controller memory. If it is clear, neither the CPU nor a peer device can put things there. So it doesn't seem to tell us anything about peer-to-peer specifically. It looks like information needed by the NVMe driver, but not by the PCI core. Bjorn