RE: [PATCH v2 0/4] vfio-pci support pasid attach/detach

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Wed, 31 Jul 2024 05:15:25 +0000

> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Wednesday, July 31, 2024 1:35 AM
> 
> On Wed, 24 Jul 2024 02:26:20 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> 
> > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > Sent: Tuesday, April 30, 2024 1:45 AM
> > >
> > > On Fri, Apr 26, 2024 at 02:13:54PM -0600, Alex Williamson wrote:
> > > > Regarding "if we accept that text file configuration should be
> > > > something the VMM supports", I'm not on board with this yet, so
> > > > applying it to PASID discussion seems premature.
> > >
> > > Sure, I'm just explaining a way this could all fit together.
> > >
> >
> > Thinking more along this direction.
> >
> > I'm not sure how long it will take to standardize such text files and
> > share them across VMMs. We may need a way to move in steps in
> > Qemu to unblock the kernel development toward that end goal, e.g.
> > first accepting a pasid option plus user-specified offset (if offset
> > unspecified then auto-pick one in cap holes). Later when the text
> > file is ready then such per-cap options can be deprecated.
> 
> Planned obsolescence is a hard sell.
> 
> > This simple way won't fix the migration issue, but at least it's on
> > par with physical caps (i.e. fail the migration if offset mismatched
> > between dest/src) and both will be fixed when the text file model
> > is ready.
> >
> > Then look at what uAPI is required to report the vPASID cap.
> >
> > In earlier discussion it's leaning toward extending GET_HW_INFO
> > in iommufd given both iommu/pci support are required to get
> > PASID working and iommu driver will not report such support until
> > pasid has been enabled in both iommu/pci. With that there is no
> > need to further report PASID in vfio-pci.
> >
> > But there may be other caps which are shared between VF and
> > PF while having nothing to do with the iommu. e.g. the Device
> > Serial Number extended cap (permitted but not recommended
> > in VF). If there is a need to report such cap on VF which doesn't
> > implement it to userspace, a vfio uAPI (device_feature or a new
> > one dedicated to synthetical vcap) appears to be inevitable.
> >
> > So I wonder whether we leave this part untouched until a real
> > demand comes or use vpasid to formalize that uAPI to be forward
> > looking. If in the end such uAPI will exist then it's a bit weird to
> > have PASID escaped (especially when vfio-pci already reports
> > PRI/ATS  which have iommu dependency too in vconfig).
> >
> > In concept the Qemu logic will be clearer if any PCI caps (real
> > or synthesized) is always conveyed via vfio-pci while iommufd is
> > for identifying a viommu cap.
> 
> There are so many moving pieces here and the discussion trailed off a
> long time ago.  I have trouble keeping all the relevant considerations
> in my head, so let me try to enumerate them, please correct/add.

Thanks for the summary!

> 
>  - The PASID capability cannot be implemented on VFs per the PCIe spec.
>    All VFs share the PF PASID configuration.  This also implies that
>    the VF PASID capability is essentially emulated since the VF driver
>    cannot manipulate the PF PASID directly.
> 
>  - VFIO does not currently expose the PASID capability for PFs, nor
>    does anything construct a vPASID capability for VFs.
> 
>  - The PASID capability is only useful in combination with a vIOMMU
>    with PASID support, which does not yet exist in QEMU.
> 
>  - Some devices are known to place registers in configuration space,
>    outside of the capability chains, which historically makes it
>    difficult to place a purely virtual capability without potentially
>    masking such hidden registers.  Current virtual capabilities are
>    placed at vendor defined fixed locations to avoid conflicts.
> 
>  - There is some expectation that otherwise compatible devices may
>    not present identical capability chains, for example devices running
>    different firmware or devices from different vendors implementing a
>    standard register ABI (virtio) where capability chain layout is not
>    standardized.
> 
>  - There have been arguments that the layout of device capabilities is
>    a policy choice, where both the kernel and libvirt traditionally try
>    to avoid making policy decisions.
> 
>  - Seamless live migration of devices requires that configuration space
>    remains at least consistent, if not identical for much of it.

I didn't quite get it. I thought being consistent means fully identical
config space from guest p.o.v.

>    Capability offsets cannot change during live migration.  This leads
>    to the text file reference above, which is essentially just the
>    notion that if the VMM defines the capability layout in config
>    space, it would need to do so via a static reference, independent of
>    the layout of the physical device and we might want to share that
>    among multiple VMMs.
> 
>  - For a vfio-pci device to support live migration it must be enabled
>    to do so by a vfio-pci variant driver.
> 
>  - We've discussed in the community and seem to have a consensus that a
>    DVSEC (Designated Vendor Specific Extended Capability) could be
>    defined to describe unused configuration space.  Such a DVSEC could
>    be implemented natively by the device or supplied by a vfio-pci
>    variant driver.  There is currently no definition of such a DVSEC.

I'm not sure whether DVSEC is still that necessary if the direction is
to go userspace-defined layout. In a synthetic world the unused
physical space doesn't really matter.

So this consensus IMHO was better placed under the umbrella of
the other direction having the kernel define the layout.

> 
> So what are we trying to accomplish here.  PASID is the first
> non-device specific virtual capability that we'd like to insert into
> the VM view of the capability chain.  It won't be the last.
> 
>  - Do we push the policy of defining the capability offset to the user?

Looks yes as I didn't see a strong argument for the opposite way.

> 
>  - Do we do some hand waving that devices supporting PASID shouldn't
>    have hidden registers and therefore the VMM can simply find a gap?

I assume 'handwaving' doesn't mean any measure in code to actually
block those devices (as doing so likely requires certain denylist based on
device/vendor ID but then why not going a step further to also hard
code an offset?). It's more a try-and-fail model where vPASID is opted
in via a cmdline parameter then a device with hidden registers may
misbehave if the VMM happens to find a conflict gap. And the impact
is restricted only to a new setup where the user is interested in
PASID  to opt hence can afford diagnostics effort to figure out the restriction.

> 
>  - Do we ask the hardware vendor or variant driver to insert a DVSEC to
>    identify available config space?

As said I don't think it's necessary if leaving the policy to the user

> 
>  - Do we handle this as just another device quirk, where we maintain a
>    table of supported devices and vPASID offset for each?
> 
>  - Do we consider this an inflection point where the VMM entirely takes
>    over the layout of the capability spaces to impose a stable
>    migration layout?  On what basis do we apply that inflection?
> 
>  - Also, do we require the same policy for both standard and extended
>    capability chains?

suppose yes.

> 
> I understand the desire to make some progress, but QEMU relies on
> integration with management tools, so a temporary option for a user to
> specify a PASID offset in isolation sounds like a non-starter to me.
> 
> This might be a better sell if the user interface allowed fully
> defining the capability chain layout from the command line and this
> interface would continue to exist and supersede how the VMM might
> otherwise define the capability chain when used.  A fully user defined
> layout would be complicated though, so I think there would still be a
> desire for QEMU to consume or define a consistent policy itself.
> 
> Even if QEMU defines the layout for a device, there may be multiple
> versions of that device.  For example, maybe we just add PASID now, but
> at some point we decide that we do want to replicate the PF serial
> number capability.  At that point we have versions of the device which
> would need to be tied to versions of the machine and maybe also
> selected via a profile switch on the device command line.
> 
> If we want to simplify this, maybe we do just look at whether the
> vIOMMU is configured for PASID support and if the device supports it,

and this is related to the open which I raised in last mail - whether we
want to report the PASID support both in iommufd and vfio-pci uAPI.

My impression is yes as there may be requirement of exposing a virtual
capability which doesn't rely on the IOMMU.

> then we just look for a gap and add the capability.  If we end up with
> different results between source and target for migration, then
> migration will fail.  Possibly we end up with a quirk table to override
> the default placement of specific capabilities on specific devices.

emm how does a quirk table work with devices having volatile config
space layout cross FW versions? Can VMM assigned with a VF be able
to check the FW version of the PF?

> That might evolve into a lookup for where we place all capabilities,
> which essentially turns into the "file" where the VMM defines the entire
> layout for some devices.

Overall this sounds a feasible path to move forward - starting with
the VMM to find the gap automatically if a new PASID option is
opted in. Devices with hidden registers may fail. Devices with volatile
config space due to FW upgrade or cross vendors may fail to migrate.
Then evolving it to the file-based scheme, and there is time to discuss
any intermediate improvement (fixed quirks, cmdline offset, etc.) in
between.

> 
> This is already TL;DR, so I'll end with that before I further drowned
> the possibility of discussion.  Thanks,
> 
> Alex