Re: [PATCH v2 0/4] vfio-pci support pasid attach/detach

Alex Williamson <alex.williamson@xxxxxxxxxx> · Tue, 30 Jul 2024 11:35:17 -0600

On Wed, 24 Jul 2024 02:26:20 +0000
"Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:

> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Tuesday, April 30, 2024 1:45 AM
> > 
> > On Fri, Apr 26, 2024 at 02:13:54PM -0600, Alex Williamson wrote:  
> > > Regarding "if we accept that text file configuration should be
> > > something the VMM supports", I'm not on board with this yet, so
> > > applying it to PASID discussion seems premature.  
> > 
> > Sure, I'm just explaining a way this could all fit together.
> >   
> 
> Thinking more along this direction.
> 
> I'm not sure how long it will take to standardize such text files and
> share them across VMMs. We may need a way to move in steps in
> Qemu to unblock the kernel development toward that end goal, e.g.
> first accepting a pasid option plus user-specified offset (if offset
> unspecified then auto-pick one in cap holes). Later when the text
> file is ready then such per-cap options can be deprecated.

Planned obsolescence is a hard sell.

> This simple way won't fix the migration issue, but at least it's on
> par with physical caps (i.e. fail the migration if offset mismatched
> between dest/src) and both will be fixed when the text file model
> is ready.
> 
> Then look at what uAPI is required to report the vPASID cap.
> 
> In earlier discussion it's leaning toward extending GET_HW_INFO
> in iommufd given both iommu/pci support are required to get
> PASID working and iommu driver will not report such support until
> pasid has been enabled in both iommu/pci. With that there is no
> need to further report PASID in vfio-pci.
> 
> But there may be other caps which are shared between VF and
> PF while having nothing to do with the iommu. e.g. the Device
> Serial Number extended cap (permitted but not recommended
> in VF). If there is a need to report such cap on VF which doesn't
> implement it to userspace, a vfio uAPI (device_feature or a new
> one dedicated to synthetical vcap) appears to be inevitable.
> 
> So I wonder whether we leave this part untouched until a real
> demand comes or use vpasid to formalize that uAPI to be forward
> looking. If in the end such uAPI will exist then it's a bit weird to
> have PASID escaped (especially when vfio-pci already reports
> PRI/ATS  which have iommu dependency too in vconfig).
> 
> In concept the Qemu logic will be clearer if any PCI caps (real
> or synthesized) is always conveyed via vfio-pci while iommufd is
> for identifying a viommu cap.

There are so many moving pieces here and the discussion trailed off a
long time ago.  I have trouble keeping all the relevant considerations
in my head, so let me try to enumerate them, please correct/add.

 - The PASID capability cannot be implemented on VFs per the PCIe spec.
   All VFs share the PF PASID configuration.  This also implies that
   the VF PASID capability is essentially emulated since the VF driver
   cannot manipulate the PF PASID directly.

 - VFIO does not currently expose the PASID capability for PFs, nor
   does anything construct a vPASID capability for VFs.

 - The PASID capability is only useful in combination with a vIOMMU
   with PASID support, which does not yet exist in QEMU.

 - Some devices are known to place registers in configuration space,
   outside of the capability chains, which historically makes it
   difficult to place a purely virtual capability without potentially
   masking such hidden registers.  Current virtual capabilities are
   placed at vendor defined fixed locations to avoid conflicts.

 - There is some expectation that otherwise compatible devices may
   not present identical capability chains, for example devices running
   different firmware or devices from different vendors implementing a
   standard register ABI (virtio) where capability chain layout is not
   standardized.

 - There have been arguments that the layout of device capabilities is
   a policy choice, where both the kernel and libvirt traditionally try
   to avoid making policy decisions.

 - Seamless live migration of devices requires that configuration space
   remains at least consistent, if not identical for much of it.
   Capability offsets cannot change during live migration.  This leads
   to the text file reference above, which is essentially just the
   notion that if the VMM defines the capability layout in config
   space, it would need to do so via a static reference, independent of
   the layout of the physical device and we might want to share that
   among multiple VMMs.

 - For a vfio-pci device to support live migration it must be enabled
   to do so by a vfio-pci variant driver.

 - We've discussed in the community and seem to have a consensus that a
   DVSEC (Designated Vendor Specific Extended Capability) could be
   defined to describe unused configuration space.  Such a DVSEC could
   be implemented natively by the device or supplied by a vfio-pci
   variant driver.  There is currently no definition of such a DVSEC.

So what are we trying to accomplish here.  PASID is the first
non-device specific virtual capability that we'd like to insert into
the VM view of the capability chain.  It won't be the last.

 - Do we push the policy of defining the capability offset to the user?

 - Do we do some hand waving that devices supporting PASID shouldn't
   have hidden registers and therefore the VMM can simply find a gap?

 - Do we ask the hardware vendor or variant driver to insert a DVSEC to
   identify available config space?

 - Do we handle this as just another device quirk, where we maintain a
   table of supported devices and vPASID offset for each?

 - Do we consider this an inflection point where the VMM entirely takes
   over the layout of the capability spaces to impose a stable
   migration layout?  On what basis do we apply that inflection?

 - Also, do we require the same policy for both standard and extended
   capability chains?

I understand the desire to make some progress, but QEMU relies on
integration with management tools, so a temporary option for a user to
specify a PASID offset in isolation sounds like a non-starter to me.

This might be a better sell if the user interface allowed fully
defining the capability chain layout from the command line and this
interface would continue to exist and supersede how the VMM might
otherwise define the capability chain when used.  A fully user defined
layout would be complicated though, so I think there would still be a
desire for QEMU to consume or define a consistent policy itself.

Even if QEMU defines the layout for a device, there may be multiple
versions of that device.  For example, maybe we just add PASID now, but
at some point we decide that we do want to replicate the PF serial
number capability.  At that point we have versions of the device which
would need to be tied to versions of the machine and maybe also
selected via a profile switch on the device command line.

If we want to simplify this, maybe we do just look at whether the
vIOMMU is configured for PASID support and if the device supports it,
then we just look for a gap and add the capability.  If we end up with
different results between source and target for migration, then
migration will fail.  Possibly we end up with a quirk table to override
the default placement of specific capabilities on specific devices.
That might evolve into a lookup for where we place all capabilities,
which essentially turns into the "file" where the VMM defines the entire
layout for some devices.

This is already TL;DR, so I'll end with that before I further drowned
the possibility of discussion.  Thanks,

Alex