> From: Tian, Kevin > Sent: Friday, January 5, 2024 10:16 AM > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Thursday, January 4, 2024 10:37 PM > > > > On Thu, Dec 14, 2023 at 07:26:39PM +0800, Yi Liu wrote: > > > Per the prior discussion[1], we agreed to move the error reporting into > the > > > driver specific part. On Intel side, we want to report two devTLB > > > invalidation errors: ICE (invalid completion error) and ITE (invalidation > > > timeout error). Such errors have an additional SID information to tell > > > which device failed the devTLB invalidation. I've got the below structure. > > > > IMHO all of this complexity is a consequence of the decision to hide > > the devtlb invalidation from the VM.. > > > > On the other hand I guess you want to do this because of the SIOV > > troubles where the vPCI function in the VM is entirely virtual and > > can't be trivially mapped to a real PCI function for ATC invalidation > > like ARM and AMD can do (but they also can't support SIOV because of > > this). :( > > > > However it also makes it very confusing about how the VM would > > perceive an error - eg if it invalidates an SIOV device single PASID > > and that devtlb fails then the error should be connected back to the > > vPCI function for the SIOV's specific PASID and not back to the > > physical PCI function for the SIOV owner. > > > > As the iommu driver itself has no idea about the vPCI functions this > > seems like it is going to get really confusing. The API I suggested in > > the other email is not entirely going to work as the vPCI function for > > SIOV cases will have to be identified by the (struct device, PASID) - > > while it would be easy enough for the iommu driver to provide the > > PASID, I'm not sure how the iommufd core will relate the PASID back to > > the iommu_device to understand SIOV without actually being aware of > > SIOV to some degree :\ > > we plan to add such awareness with a new binding helper: > > https://lore.kernel.org/lkml/20231009085123.463179-4-yi.l.liu@xxxxxxxxx/ > > and with metadata to track association between PASID's and iommufd vdev. > > but in reality the relation could be identified in an easy way due to a SIOV > restriction which we discussed before - shared PASID space of PF disallows > assigning sibling vdev's to a same VM (otherwise no way to identify which > sibling vdev triggering an iopf when a pasid is used on both vdev's). That > restriction implies that within an iommufd context every iommufd_device > object should contain a unique struct device pointer. So PASID can be > instead ignored in the lookup then just always do iommufd_get_dev_id() > using struct device. A bit more background. Previously we thought this restriction only applies to SIOV+vSVA, as a guest process may bind to both sibling vdev's, leading to the same pasid situation. In concept w/o vSVA it's still possible to assign sibling vdev's to a same VM as each vdev is allocated with a unique pasid to mark vRID so can be differentiated from each other in the fault/error path. But when looking at this err code issue with Yi closely, we found there is another gap in the VT-d spec. Upon devtlb invalidation timeout the hw doesn't report pasid in the error info register. this makes it impossible to identify the source vdev if a hwpt invalidation request involves sibling vdev's from a same PF. with that I'm inclined to always imposing this restriction for SIOV. One may argue that SIOV w/o vSVA w/o devtlb is conceptually immune but I'm with you that given SIOVr1 is one-off I prefer to limiting its usability other than complexing the kernel.