RE: [PATCH v2 0/4] vfio-pci support pasid attach/detach

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Thu, 1 Aug 2024 07:45:43 +0000

> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Thursday, August 1, 2024 1:05 AM
> 
> On Wed, 31 Jul 2024 05:15:25 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> 
> > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Sent: Wednesday, July 31, 2024 1:35 AM
> > >
> > >  - Seamless live migration of devices requires that configuration space
> > >    remains at least consistent, if not identical for much of it.
> >
> > I didn't quite get it. I thought being consistent means fully identical
> > config space from guest p.o.v.
> 
> See for example:
> 
> https://gitlab.com/qemu-project/qemu/-
> /commit/187716feeba406b5a3879db66a7bafd687472a1f

Thanks!

> 
> The layout of config space and most of the contents therein need to be
> identical, but there are arguably elements that could be volatile which
> only need to be consistent.

hmm IMHO it's more that the guest doesn't care volatile content in that
field instead of the guest view being strictly consistent. Probably I
don't really understand the meaning of consistency in this context...

btw that fix claims:

  "
  Here consistency could mean that VSC format should be same on
  source and destination, however actual Vendor Specific Info may
  not be byte-to-byte identical.
  "

Does it apply to all devices supporting VSC? It's OK for NVDIA vGPU
but I'm not sure whether some vendor driver might be sensitive to
byte-to-byte consistency in VSC.

> > >
> > >  - We've discussed in the community and seem to have a consensus that a
> > >    DVSEC (Designated Vendor Specific Extended Capability) could be
> > >    defined to describe unused configuration space.  Such a DVSEC could
> > >    be implemented natively by the device or supplied by a vfio-pci
> > >    variant driver.  There is currently no definition of such a DVSEC.
> >
> > I'm not sure whether DVSEC is still that necessary if the direction is
> > to go userspace-defined layout. In a synthetic world the unused
> > physical space doesn't really matter.
> >
> > So this consensus IMHO was better placed under the umbrella of
> > the other direction having the kernel define the layout.
> 
> I agree that we don't seem to be headed in a direction that requires
> this, but I just wanted to include that there was a roughly agreed upon
> way for devices and variant drivers to annotate unused config space
> ranges for higher levels.  If we head in a direction where the VMM
> chooses an offset for the PASID capability, we need to keep track of
> whether this DVSEC comes to fruition and how that affects the offset
> that QEMU might choose.

Yes. We can keep this option in case there is a demand, especially
if the file-based synthesized scheme won't be built in one day and
we need a default policy in VMM.

> 
> > > So what are we trying to accomplish here.  PASID is the first
> > > non-device specific virtual capability that we'd like to insert into
> > > the VM view of the capability chain.  It won't be the last.
> > >
> > >  - Do we push the policy of defining the capability offset to the user?
> >
> > Looks yes as I didn't see a strong argument for the opposite way.
> 
> It's a policy choice though, so where and how is it implemented?  It
> works fine for those of us willing to edit xml or launch VMs by command
> line, but libvirt isn't going to sign up to insert a policy choice for
> a device.  If we get to even higher level tools, does anything that
> wants to implement PASID support required a vendor operator driver to
> make such policy choices (btw, I'm just throwing out the "operator"
> term as if I know what it means, I don't).

I had a rough feeling that there might be other usages requiring such
vendor plugin, e.g. provisioning VF/ADI may require vendor specific
configurations, but not really an expert in this area.

Overall I feel most of our discussions so far are about VMM-auto-
find-offset vs. file-based-policy-scheme which both belong to
user-defined policy, suggesting that we all agreed to drop the other
way having kernel define the offset (plus in-kernel quirks, etc.)?

Even the said DVSEC is to assist such user-defined direction.

> 
> > >  - Do we do some hand waving that devices supporting PASID shouldn't
> > >    have hidden registers and therefore the VMM can simply find a gap?
> >
> > I assume 'handwaving' doesn't mean any measure in code to actually
> > block those devices (as doing so likely requires certain denylist based on
> > device/vendor ID but then why not going a step further to also hard
> > code an offset?). It's more a try-and-fail model where vPASID is opted
> > in via a cmdline parameter then a device with hidden registers may
> > misbehave if the VMM happens to find a conflict gap. And the impact
> > is restricted only to a new setup where the user is interested in
> > PASID  to opt hence can afford diagnostics effort to figure out the
> restriction.
> 
> If you want to hard code an offset then we're effectively introducing a
> device specific quirk to enable PASID support.  I thought we wanted
> this to work generically for any device exposing PASID, therefore I was
> thinking more of "find a gap" as the default strategy with quirks used
> to augment the resulting offset where necessary.
> 
> I'd also be careful about command line parameters.  I think we require
> one for the vIOMMU to enable PASID support, but I'd prefer to avoid one
> on the vfio-pci device, instead simply enabling support when both the
> vIOMMU support is enabled and the device is detected to support it.
> Each command line option requires support in the upper level tools to
> enable it.

Make sense. btw will there be a requirement that the user wants to
disable PASID even if the device supports it, e.g. for testing purpose
or to workaround a HW errata disclosed after host driver claims the
support in an old kernel?

> > >
> > > I understand the desire to make some progress, but QEMU relies on
> > > integration with management tools, so a temporary option for a user to
> > > specify a PASID offset in isolation sounds like a non-starter to me.
> > >
> > > This might be a better sell if the user interface allowed fully
> > > defining the capability chain layout from the command line and this
> > > interface would continue to exist and supersede how the VMM might
> > > otherwise define the capability chain when used.  A fully user defined
> > > layout would be complicated though, so I think there would still be a
> > > desire for QEMU to consume or define a consistent policy itself.
> > >
> > > Even if QEMU defines the layout for a device, there may be multiple
> > > versions of that device.  For example, maybe we just add PASID now, but
> > > at some point we decide that we do want to replicate the PF serial
> > > number capability.  At that point we have versions of the device which
> > > would need to be tied to versions of the machine and maybe also
> > > selected via a profile switch on the device command line.
> > >
> > > If we want to simplify this, maybe we do just look at whether the
> > > vIOMMU is configured for PASID support and if the device supports it,
> >
> > and this is related to the open which I raised in last mail - whether we
> > want to report the PASID support both in iommufd and vfio-pci uAPI.
> >
> > My impression is yes as there may be requirement of exposing a virtual
> > capability which doesn't rely on the IOMMU.
> 
> What's the purpose of reporting PASID via both iommufd and vfio-pci?  I
> agree that there will be capabilities related to the iommufd and
> capabilities only related to the device, but I disagree that that
> provides justification to report PASID via both uAPIs.  Are we also
> going to ask iommufd to report that a device has an optional serial
> number capability?  It clearly doesn't make sense for iommufd to be

Certainly no. My point was that vfio-pci/iommufd each reports its
own capability set. They may overlap but this fact just matches the
physical world.

> involved with that, so why does it make sense for vfio-pci to be
> involved in reporting something that is more iommufd specific?

It doesn't matter which one involves more. It's more akin to the
physical world.

btw vfio-pci already reports ATS/PRI which both rely on iommufd
in vconfig space. Throwing PASID alone to iommufd uAPI lacks of a
good justification for why it's special.

I envision an extension to vfio device feature or a new vfio uAPI
for reporting virtual capabilities as augment to the ones filled in
vconfig space. 

> 
> > > then we just look for a gap and add the capability.  If we end up with
> > > different results between source and target for migration, then
> > > migration will fail.  Possibly we end up with a quirk table to override
> > > the default placement of specific capabilities on specific devices.
> >
> > emm how does a quirk table work with devices having volatile config
> > space layout cross FW versions? Can VMM assigned with a VF be able
> > to check the FW version of the PF?
> 
> If the VMM can't find the same gap between source and destination then
> a quirk could make sure that the PASID offset is consistent.  But also
> if the VMM doesn't find the same gap then that suggests the config
> space is already different and not only the offset of the PASID
> capability will need to be fixed via a quirk, so then we're into
> quirking the entire capability space for the device.

yes. So the quirk table is more for fixing the functional gap (i.e. not
overlap with a hidden register) instead of for migration. As long as
a device can function correctly with it, the virtual caps fall into the
same restriction as physical caps in migration i.e. upon inconsistent
layout between src/dest we'll need separate way to synthesize the
entire space.

> 
> The VMM should not be assumed to have any additional privileges beyond
> what we provide it through the vfio device and iommufd interface.
> Testing anything about the PF would require access on the host that
> won't work in more secure environments.  Therefore if we can't
> consistently place the PASID for a device, we probably need to quirk it
> based on the vendor/device IDs or sub-IDs or we need to rely on a
> management implied policy such as a device profile option on the QEMU
> command line or maybe different classes of the vfio-pci driver in QEMU.
> 
> > > That might evolve into a lookup for where we place all capabilities,
> > > which essentially turns into the "file" where the VMM defines the entire
> > > layout for some devices.
> >
> > Overall this sounds a feasible path to move forward - starting with
> > the VMM to find the gap automatically if a new PASID option is
> > opted in. Devices with hidden registers may fail. Devices with volatile
> > config space due to FW upgrade or cross vendors may fail to migrate.
> > Then evolving it to the file-based scheme, and there is time to discuss
> > any intermediate improvement (fixed quirks, cmdline offset, etc.) in
> > between.
> 
> As above, let's be careful about introducing unnecessary command line
> options, especially if we expect support for them in higher level
> tools.  If we place the PASID somewhere that makes the device not work,
> then disabling PASID on the vIOMMU should resolve that.  It won't be a

vIOMMU is per-platform then it applies to all devices behind, including
those which don't have a problem with auto-selected offset. Not sure
whether one would want to continue enabling PASID for other devices
or should stop immediately to find a quirk for the problematic one and
then resume.

> regression, it will only be an incompatibility with a new feature.
> That incompatibility may require a quirk to resolve to have the PASID
> placed somewhere else.  If the PASID is placed at different offsets
> based on device firmware or vendor then the location of the PASID alone
> isn't the only thing preventing migration and we'll need to introduce
> code for the VMM to take ownership of the capability layout at that
> point.  Thanks,
> 

Yes, the migration issue might be solved in a separate track as it applies
to both physical and virtual capabilities.