On Sat, Jun 30, 2018 at 10:45:21AM -0400, Sinan Kaya wrote: > On 6/29/2018 8:49 PM, Bjorn Helgaas wrote: > > On Tue, Jun 19, 2018 at 10:14:46PM -0400, Sinan Kaya wrote: > >> A PCIe endpoint carries the process address space identifier (PASID) in > >> the TLP prefix as part of the memory read/write transaction. The address > >> information in the TLP is relevant only for a given PASID context. > >> > >> An IOMMU takes PASID value and the address information from the > >> TLP to look up the physical address in the system. > >> > >> If a bridge drops the TLP prefix, the translation agent can resolve the > >> address to an incorrect location and cause data corruption. Prevent > >> this condition by requiring End-to-End TLP prefix to be supported on the > >> entire data path between the endpoint and the root port. > > > > PASID is an End-End TLP Prefix (PCIe r4.0, sec 6.20). Sec 2.2.10.2 says > > > > It is an error to receive a TLP with an End-End TLP Prefix by a > > Receiver that does not support End-End TLP Prefixes. A TLP in > > violation of this rule is handled as a Malformed TLP. This is a > > reported error associated with the Receiving Port (see Section 6.2). > > > > So I agree that we shouldn't enable PASID in an endpoint unless all > > the switch ports leading to it support End-End prefixes. But I don't > > see how a bridge can drop a prefix and cause data corruption -- if it > > doesn't support End-End prefixes, shouldn't the bridge raise a > > Malformed TLP error instead of forwarding the TLP? > > It should under normal circumstances. > > I remember reading that most PCIe switches don't support TLP prefixes. > I don't know if it is because of buggy behavior or if it is just plain > unsupported while dropping the request as Malformed TLP. > > I was trying to be proactive and not enable PASID if the entire path > is incapable. Absolutely, that makes perfect sense. Much better to fail to enable PASID rather than enabling it and seeing Malformed TLP errors or data corruption later. I was trying to figure out if you can actually force data corruption this way. If you can, I'd say that sounds like a buggy switch that we might want to be aware of. Bjorn