On Fri, Feb 03, 2023 at 12:20:45PM -0600, Bjorn Helgaas wrote: > > The PCI rules are a bit complicated: > > - A simple MemRd/Wr with a PASID will be routed according to the > > address. This can be mis-routed > > - A ATS translation request with a PASID is always routed to the host > > bridge > > From a PCIe point of view, I think these cases are equivalent because > a PASID prefix doesn't affect routing (sec 2.2.10.4). A Translation > Request includes an Untranslated Address, and if that happens to look > like a PCI bus address, I think it will be mis-routed just like a > MemRd/Wr would be. So, I trusted AMD when they said this is why their platform worked, I didn't check carefully the ATS spec language. But looking now, I agree. I don't see any language that says ATS is routed differently, if anything it says it is routed the same way as MemRd. AMD folks? If yes then this has all been the wrong direction. > > > Do IOMMUs allocate (PASID, Untranslated Addresses) that look like > > > PCI bus addresses? > > > > Yes, because it is mapped to a mm_struct userspace can use any mmap > > to access any valid address as an IOVA and thus PASID tagged > > translation must never become confused with bus addresses. > > If PCI bus addresses are carved out of the Translated Address arena, > the MemRd/Wr TLPs should be fine, Yes, that is my point that Translated requests are fine because they already carry a Translated address, and by its nature a Translated address cannot be mis-routed. The spec actually talks about this whole issue see the NOTE at the end of 10.1.1 Per that note, for Linux, the "implementation constraint on the TA" is that the TA requires ACS so that Untranslated always routes to the TA. The AMD defect is some BIOS's on some of their SOC's don't report ACS for the MFD, even though it does presumably implement ACS. > Are we on track to fix this before v6.2? AMD? Did the patches you were talking about to fix the oops in the AMD driver land? Can we rely on that to fix the black screen for v6.2? Otherwise I think the PCI core is correct here and I'm loath to change it to work around a GPU driver bug. The GPU driver should understand that PASID is not supported because the PCI core says so. It shouldn't crash and blackscreen. Keep in mind, from a PCI perspective, this protection is a security senstive check, described in the spec. If we drop it we expose platforms using SVA to a potential security issue upon mis-configuration. So, I'd much rather leave the PCI alone and see that the AMD driver is fixed. Jason