> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Thursday, August 1, 2024 1:05 AM > > On Wed, 31 Jul 2024 05:15:25 +0000 > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > Sent: Wednesday, July 31, 2024 1:35 AM > > > > > > - Seamless live migration of devices requires that configuration space > > > remains at least consistent, if not identical for much of it. > > > > I didn't quite get it. I thought being consistent means fully identical > > config space from guest p.o.v. > > See for example: > > https://gitlab.com/qemu-project/qemu/- > /commit/187716feeba406b5a3879db66a7bafd687472a1f Thanks! > > The layout of config space and most of the contents therein need to be > identical, but there are arguably elements that could be volatile which > only need to be consistent. hmm IMHO it's more that the guest doesn't care volatile content in that field instead of the guest view being strictly consistent. Probably I don't really understand the meaning of consistency in this context... btw that fix claims: " Here consistency could mean that VSC format should be same on source and destination, however actual Vendor Specific Info may not be byte-to-byte identical. " Does it apply to all devices supporting VSC? It's OK for NVDIA vGPU but I'm not sure whether some vendor driver might be sensitive to byte-to-byte consistency in VSC. > > > > > > - We've discussed in the community and seem to have a consensus that a > > > DVSEC (Designated Vendor Specific Extended Capability) could be > > > defined to describe unused configuration space. Such a DVSEC could > > > be implemented natively by the device or supplied by a vfio-pci > > > variant driver. There is currently no definition of such a DVSEC. > > > > I'm not sure whether DVSEC is still that necessary if the direction is > > to go userspace-defined layout. In a synthetic world the unused > > physical space doesn't really matter. > > > > So this consensus IMHO was better placed under the umbrella of > > the other direction having the kernel define the layout. > > I agree that we don't seem to be headed in a direction that requires > this, but I just wanted to include that there was a roughly agreed upon > way for devices and variant drivers to annotate unused config space > ranges for higher levels. If we head in a direction where the VMM > chooses an offset for the PASID capability, we need to keep track of > whether this DVSEC comes to fruition and how that affects the offset > that QEMU might choose. Yes. We can keep this option in case there is a demand, especially if the file-based synthesized scheme won't be built in one day and we need a default policy in VMM. > > > > So what are we trying to accomplish here. PASID is the first > > > non-device specific virtual capability that we'd like to insert into > > > the VM view of the capability chain. It won't be the last. > > > > > > - Do we push the policy of defining the capability offset to the user? > > > > Looks yes as I didn't see a strong argument for the opposite way. > > It's a policy choice though, so where and how is it implemented? It > works fine for those of us willing to edit xml or launch VMs by command > line, but libvirt isn't going to sign up to insert a policy choice for > a device. If we get to even higher level tools, does anything that > wants to implement PASID support required a vendor operator driver to > make such policy choices (btw, I'm just throwing out the "operator" > term as if I know what it means, I don't). I had a rough feeling that there might be other usages requiring such vendor plugin, e.g. provisioning VF/ADI may require vendor specific configurations, but not really an expert in this area. Overall I feel most of our discussions so far are about VMM-auto- find-offset vs. file-based-policy-scheme which both belong to user-defined policy, suggesting that we all agreed to drop the other way having kernel define the offset (plus in-kernel quirks, etc.)? Even the said DVSEC is to assist such user-defined direction. > > > > - Do we do some hand waving that devices supporting PASID shouldn't > > > have hidden registers and therefore the VMM can simply find a gap? > > > > I assume 'handwaving' doesn't mean any measure in code to actually > > block those devices (as doing so likely requires certain denylist based on > > device/vendor ID but then why not going a step further to also hard > > code an offset?). It's more a try-and-fail model where vPASID is opted > > in via a cmdline parameter then a device with hidden registers may > > misbehave if the VMM happens to find a conflict gap. And the impact > > is restricted only to a new setup where the user is interested in > > PASID to opt hence can afford diagnostics effort to figure out the > restriction. > > If you want to hard code an offset then we're effectively introducing a > device specific quirk to enable PASID support. I thought we wanted > this to work generically for any device exposing PASID, therefore I was > thinking more of "find a gap" as the default strategy with quirks used > to augment the resulting offset where necessary. > > I'd also be careful about command line parameters. I think we require > one for the vIOMMU to enable PASID support, but I'd prefer to avoid one > on the vfio-pci device, instead simply enabling support when both the > vIOMMU support is enabled and the device is detected to support it. > Each command line option requires support in the upper level tools to > enable it. Make sense. btw will there be a requirement that the user wants to disable PASID even if the device supports it, e.g. for testing purpose or to workaround a HW errata disclosed after host driver claims the support in an old kernel? > > > > > > I understand the desire to make some progress, but QEMU relies on > > > integration with management tools, so a temporary option for a user to > > > specify a PASID offset in isolation sounds like a non-starter to me. > > > > > > This might be a better sell if the user interface allowed fully > > > defining the capability chain layout from the command line and this > > > interface would continue to exist and supersede how the VMM might > > > otherwise define the capability chain when used. A fully user defined > > > layout would be complicated though, so I think there would still be a > > > desire for QEMU to consume or define a consistent policy itself. > > > > > > Even if QEMU defines the layout for a device, there may be multiple > > > versions of that device. For example, maybe we just add PASID now, but > > > at some point we decide that we do want to replicate the PF serial > > > number capability. At that point we have versions of the device which > > > would need to be tied to versions of the machine and maybe also > > > selected via a profile switch on the device command line. > > > > > > If we want to simplify this, maybe we do just look at whether the > > > vIOMMU is configured for PASID support and if the device supports it, > > > > and this is related to the open which I raised in last mail - whether we > > want to report the PASID support both in iommufd and vfio-pci uAPI. > > > > My impression is yes as there may be requirement of exposing a virtual > > capability which doesn't rely on the IOMMU. > > What's the purpose of reporting PASID via both iommufd and vfio-pci? I > agree that there will be capabilities related to the iommufd and > capabilities only related to the device, but I disagree that that > provides justification to report PASID via both uAPIs. Are we also > going to ask iommufd to report that a device has an optional serial > number capability? It clearly doesn't make sense for iommufd to be Certainly no. My point was that vfio-pci/iommufd each reports its own capability set. They may overlap but this fact just matches the physical world. > involved with that, so why does it make sense for vfio-pci to be > involved in reporting something that is more iommufd specific? It doesn't matter which one involves more. It's more akin to the physical world. btw vfio-pci already reports ATS/PRI which both rely on iommufd in vconfig space. Throwing PASID alone to iommufd uAPI lacks of a good justification for why it's special. I envision an extension to vfio device feature or a new vfio uAPI for reporting virtual capabilities as augment to the ones filled in vconfig space. > > > > then we just look for a gap and add the capability. If we end up with > > > different results between source and target for migration, then > > > migration will fail. Possibly we end up with a quirk table to override > > > the default placement of specific capabilities on specific devices. > > > > emm how does a quirk table work with devices having volatile config > > space layout cross FW versions? Can VMM assigned with a VF be able > > to check the FW version of the PF? > > If the VMM can't find the same gap between source and destination then > a quirk could make sure that the PASID offset is consistent. But also > if the VMM doesn't find the same gap then that suggests the config > space is already different and not only the offset of the PASID > capability will need to be fixed via a quirk, so then we're into > quirking the entire capability space for the device. yes. So the quirk table is more for fixing the functional gap (i.e. not overlap with a hidden register) instead of for migration. As long as a device can function correctly with it, the virtual caps fall into the same restriction as physical caps in migration i.e. upon inconsistent layout between src/dest we'll need separate way to synthesize the entire space. > > The VMM should not be assumed to have any additional privileges beyond > what we provide it through the vfio device and iommufd interface. > Testing anything about the PF would require access on the host that > won't work in more secure environments. Therefore if we can't > consistently place the PASID for a device, we probably need to quirk it > based on the vendor/device IDs or sub-IDs or we need to rely on a > management implied policy such as a device profile option on the QEMU > command line or maybe different classes of the vfio-pci driver in QEMU. > > > > That might evolve into a lookup for where we place all capabilities, > > > which essentially turns into the "file" where the VMM defines the entire > > > layout for some devices. > > > > Overall this sounds a feasible path to move forward - starting with > > the VMM to find the gap automatically if a new PASID option is > > opted in. Devices with hidden registers may fail. Devices with volatile > > config space due to FW upgrade or cross vendors may fail to migrate. > > Then evolving it to the file-based scheme, and there is time to discuss > > any intermediate improvement (fixed quirks, cmdline offset, etc.) in > > between. > > As above, let's be careful about introducing unnecessary command line > options, especially if we expect support for them in higher level > tools. If we place the PASID somewhere that makes the device not work, > then disabling PASID on the vIOMMU should resolve that. It won't be a vIOMMU is per-platform then it applies to all devices behind, including those which don't have a problem with auto-selected offset. Not sure whether one would want to continue enabling PASID for other devices or should stop immediately to find a quirk for the problematic one and then resume. > regression, it will only be an incompatibility with a new feature. > That incompatibility may require a quirk to resolve to have the PASID > placed somewhere else. If the PASID is placed at different offsets > based on device firmware or vendor then the location of the PASID alone > isn't the only thing preventing migration and we'll need to introduce > code for the VMM to take ownership of the capability layout at that > point. Thanks, > Yes, the migration issue might be solved in a separate track as it applies to both physical and virtual capabilities.