On Wed, 24 Jul 2024 02:26:20 +0000 "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Tuesday, April 30, 2024 1:45 AM > > > > On Fri, Apr 26, 2024 at 02:13:54PM -0600, Alex Williamson wrote: > > > Regarding "if we accept that text file configuration should be > > > something the VMM supports", I'm not on board with this yet, so > > > applying it to PASID discussion seems premature. > > > > Sure, I'm just explaining a way this could all fit together. > > > > Thinking more along this direction. > > I'm not sure how long it will take to standardize such text files and > share them across VMMs. We may need a way to move in steps in > Qemu to unblock the kernel development toward that end goal, e.g. > first accepting a pasid option plus user-specified offset (if offset > unspecified then auto-pick one in cap holes). Later when the text > file is ready then such per-cap options can be deprecated. Planned obsolescence is a hard sell. > This simple way won't fix the migration issue, but at least it's on > par with physical caps (i.e. fail the migration if offset mismatched > between dest/src) and both will be fixed when the text file model > is ready. > > Then look at what uAPI is required to report the vPASID cap. > > In earlier discussion it's leaning toward extending GET_HW_INFO > in iommufd given both iommu/pci support are required to get > PASID working and iommu driver will not report such support until > pasid has been enabled in both iommu/pci. With that there is no > need to further report PASID in vfio-pci. > > But there may be other caps which are shared between VF and > PF while having nothing to do with the iommu. e.g. the Device > Serial Number extended cap (permitted but not recommended > in VF). If there is a need to report such cap on VF which doesn't > implement it to userspace, a vfio uAPI (device_feature or a new > one dedicated to synthetical vcap) appears to be inevitable. > > So I wonder whether we leave this part untouched until a real > demand comes or use vpasid to formalize that uAPI to be forward > looking. If in the end such uAPI will exist then it's a bit weird to > have PASID escaped (especially when vfio-pci already reports > PRI/ATS which have iommu dependency too in vconfig). > > In concept the Qemu logic will be clearer if any PCI caps (real > or synthesized) is always conveyed via vfio-pci while iommufd is > for identifying a viommu cap. There are so many moving pieces here and the discussion trailed off a long time ago. I have trouble keeping all the relevant considerations in my head, so let me try to enumerate them, please correct/add. - The PASID capability cannot be implemented on VFs per the PCIe spec. All VFs share the PF PASID configuration. This also implies that the VF PASID capability is essentially emulated since the VF driver cannot manipulate the PF PASID directly. - VFIO does not currently expose the PASID capability for PFs, nor does anything construct a vPASID capability for VFs. - The PASID capability is only useful in combination with a vIOMMU with PASID support, which does not yet exist in QEMU. - Some devices are known to place registers in configuration space, outside of the capability chains, which historically makes it difficult to place a purely virtual capability without potentially masking such hidden registers. Current virtual capabilities are placed at vendor defined fixed locations to avoid conflicts. - There is some expectation that otherwise compatible devices may not present identical capability chains, for example devices running different firmware or devices from different vendors implementing a standard register ABI (virtio) where capability chain layout is not standardized. - There have been arguments that the layout of device capabilities is a policy choice, where both the kernel and libvirt traditionally try to avoid making policy decisions. - Seamless live migration of devices requires that configuration space remains at least consistent, if not identical for much of it. Capability offsets cannot change during live migration. This leads to the text file reference above, which is essentially just the notion that if the VMM defines the capability layout in config space, it would need to do so via a static reference, independent of the layout of the physical device and we might want to share that among multiple VMMs. - For a vfio-pci device to support live migration it must be enabled to do so by a vfio-pci variant driver. - We've discussed in the community and seem to have a consensus that a DVSEC (Designated Vendor Specific Extended Capability) could be defined to describe unused configuration space. Such a DVSEC could be implemented natively by the device or supplied by a vfio-pci variant driver. There is currently no definition of such a DVSEC. So what are we trying to accomplish here. PASID is the first non-device specific virtual capability that we'd like to insert into the VM view of the capability chain. It won't be the last. - Do we push the policy of defining the capability offset to the user? - Do we do some hand waving that devices supporting PASID shouldn't have hidden registers and therefore the VMM can simply find a gap? - Do we ask the hardware vendor or variant driver to insert a DVSEC to identify available config space? - Do we handle this as just another device quirk, where we maintain a table of supported devices and vPASID offset for each? - Do we consider this an inflection point where the VMM entirely takes over the layout of the capability spaces to impose a stable migration layout? On what basis do we apply that inflection? - Also, do we require the same policy for both standard and extended capability chains? I understand the desire to make some progress, but QEMU relies on integration with management tools, so a temporary option for a user to specify a PASID offset in isolation sounds like a non-starter to me. This might be a better sell if the user interface allowed fully defining the capability chain layout from the command line and this interface would continue to exist and supersede how the VMM might otherwise define the capability chain when used. A fully user defined layout would be complicated though, so I think there would still be a desire for QEMU to consume or define a consistent policy itself. Even if QEMU defines the layout for a device, there may be multiple versions of that device. For example, maybe we just add PASID now, but at some point we decide that we do want to replicate the PF serial number capability. At that point we have versions of the device which would need to be tied to versions of the machine and maybe also selected via a profile switch on the device command line. If we want to simplify this, maybe we do just look at whether the vIOMMU is configured for PASID support and if the device supports it, then we just look for a gap and add the capability. If we end up with different results between source and target for migration, then migration will fail. Possibly we end up with a quirk table to override the default placement of specific capabilities on specific devices. That might evolve into a lookup for where we place all capabilities, which essentially turns into the "file" where the VMM defines the entire layout for some devices. This is already TL;DR, so I'll end with that before I further drowned the possibility of discussion. Thanks, Alex