On Thu, May 07, 2015 at 12:16:30PM -0500, Bjorn Helgaas wrote: > On Thu, May 7, 2015 at 11:23 AM, William Davis <wdavis@xxxxxxxxxx> wrote: > >> From: Bjorn Helgaas [mailto:bhelgaas@xxxxxxxxxx] > >> Sent: Thursday, May 7, 2015 8:13 AM > >> To: Yijing Wang > >> Cc: William Davis; Joerg Roedel; open list:INTEL IOMMU (VT-d); linux- > >> pci@xxxxxxxxxxxxxxx; Terence Ripperda; John Hubbard; Jerome Glisse; Dave > >> Jiang; David S. Miller; Alex Williamson > >> Subject: Re: [PATCH 0/6] IOMMU/DMA map_resource support for peer-to-peer > >> > >> On Wed, May 6, 2015 at 8:48 PM, Yijing Wang <wangyijing@xxxxxxxxxx> wrote: > >> > On 2015/5/7 6:18, Bjorn Helgaas wrote: > >> >> [+cc Yijing, Dave J, Dave M, Alex] > >> >> > >> >> On Fri, May 01, 2015 at 01:32:12PM -0500, wdavis@xxxxxxxxxx wrote: > >> >>> From: Will Davis <wdavis@xxxxxxxxxx> > >> >>> > >> >>> Hi, > >> >>> > >> >>> This patch series adds DMA APIs to map and unmap a struct resource > >> >>> to and from a PCI device's IOVA domain, and implements the AMD, > >> >>> Intel, and nommu versions of these interfaces. > >> >>> > >> >>> This solves a long-standing problem with the existing DMA-remapping > >> >>> interfaces, which require that a struct page be given for the region > >> >>> to be mapped into a device's IOVA domain. This requirement cannot > >> >>> support peer device BAR ranges, for which no struct pages exist. > >> >>> ... > >> > >> >> I think we currently assume there's no peer-to-peer traffic. > >> >> > >> >> I don't know whether changing that will break anything, but I'm > >> >> concerned about these: > >> >> > >> >> - PCIe MPS configuration (see pcie_bus_configure_settings()). > >> > > >> > I think it should be ok for PCIe MPS configuration, PCIE_BUS_PEER2PEER > >> > force every device's MPS to 128B, what its concern is the TLP payload > >> > size. In this series, it seems to only map a iova for device bar region. > >> > >> MPS configuration makes assumptions about whether there will be any peer- > >> to-peer traffic. If there will be none, MPS can be configured more > >> aggressively. > >> > >> I don't think Linux has any way to detect whether a driver is doing peer- > >> to-peer, and there's no way to prevent a driver from doing it. > >> We're stuck with requiring the user to specify boot options > >> ("pci=pcie_bus_safe", "pci=pcie_bus_perf", "pci=pcie_bus_peer2peer", > >> etc.) that tell the PCI core what the user expects to happen. > >> > >> This is a terrible user experience. The user has no way to tell what > >> drivers are going to do. If he specifies the wrong thing, e.g., "assume no > >> peer-to-peer traffic," and then loads a driver that does peer-to-peer, the > >> kernel will configure MPS aggressively and when the device does a peer-to- > >> peer transfer, it may cause a Malformed TLP error. > >> > > > > I agree that this isn't a great user experience, but just want to clarify > > that this problem is orthogonal to this patch series, correct? > > > > Prior to this series, the MPS mismatch is still possible with p2p traffic, > > but when an IOMMU is enabled p2p traffic will result in DMAR faults. The > > aim of the series is to allow drivers to fix the latter, not the former. > > Prior to this series, there wasn't any infrastructure for drivers to > do p2p, so it was mostly reasonable to assume that there *was* no p2p > traffic. > > I think we currently default to doing nothing to MPS. Prior to this > series, it might have been reasonable to optimize based on a "no-p2p" > assumption, e.g., default to pcie_bus_safe or pcie_bus_perf. After > this series, I'm not sure what we could do, because p2p will be much > more likely. > > It's just an issue; I don't know what the resolution is. Can't we just have each device update its MPS at runtime. So if device A decide to map something from device B then device A update MPS for A and B to lowest common supported value. Of course you need to keep track of that per device so that if a device C comes around and want to exchange with device B and both C and B support higher payload than A then if C reprogram B it will trigger issue for A. I know we update other PCIE configuration parameter at runtime for GPU, dunno if it is widely tested for other devices. Cheers, Jérôme -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html