On Tue, 31 Jan 2023 22:36:27 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Tue, Jan 31, 2023 at 06:14:19PM -0600, Bjorn Helgaas wrote: > > > > AMD GPU is one of those devices. > > > > I guess you mean the AMD GPU has ATS, PRI, and PASID Capabilities? > > And furthermore, that the GPU *always* uses Translated addresses with > > PASID? > > I'm not versed in the spec lingo, but the GPU issues MemRd/Wrs with > the translated bit set and no PASID header - which is the correct form > for an address that was translated by ATS. FWIW there is a capability bit and enable bit in the PASID cap/control registers that says whether a device can/should add a PASID to a translated request or not. I think the intent is that a host can sanity check AT requests to make sure the device isn't making them up. To do that it needs the PASID. Not sure any hosts do this yet though ;) Not worth much, but I thought it always sent the PASID so dug out spec to check (I was wrong as it is both optional and configurable). > > To get to that it issues ATS requests, and only the ATS related > requests will carry the PASID. > > ATS related requests always route to the root port, which is why it is > functionally equivalent to ACS RR/UF in these cases. > > Translated requests always route where they are supposed to go, even > with P2P and things. > > > And this applies even if there is no ACS or ACS doesn't support > > PCI_ACS_RR and PCI_ACS_UF. > > > > The black screen happens because ... ? > > AMD GPU driver bugs blow up if it cannot setup PASID. > > > I couldn't figure out the NULL pointer dereference. I expected it to > > be from a BUG() or similar in report_iommu_fault(), but I don't see > > that. > > IIRC it is a buggy error unwind handling in the AMD GPU driver. > > Jason