RE: [PATCH v2 0/4] vfio-pci support pasid attach/detach

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> Sent: Saturday, April 20, 2024 12:36 AM
> 
> On Fri, 19 Apr 2024 05:52:01 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> 
> > > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > Sent: Friday, April 19, 2024 4:38 AM
> > >
> > > On Thu, 18 Apr 2024 17:03:15 +0800
> > > Yi Liu <yi.l.liu@xxxxxxxxx> wrote:
> > >
> > > > On 2024/4/18 08:06, Tian, Kevin wrote:
> > > > >> From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > > > >> Sent: Thursday, April 18, 2024 7:02 AM
> > > > >>
> > > > >> But we don't actually expose the PASID capability on the PF and as
> > > > >> argued in path 4/ we can't because it would break existing userspace.
> > > > > > Come back to this statement.
> > > > >
> > > > > Does 'break' means that legacy Qemu will crash due to a guest write
> > > > > to the read-only PASID capability, or just a conceptually functional
> > > > > break i.e. non-faithful emulation due to writes being dropped?
> > >
> > > I expect more the latter.
> > >
> > > > > If the latter it's probably not a bad idea to allow exposing the PASID
> > > > > capability on the PF as a sane guest shouldn't enable the PASID
> > > > > capability w/o seeing vIOMMU supporting PASID. And there is no
> > > > > status bit defined in the PASID capability to check back so even
> > > > > if an insane guest wants to blindly enable PASID it will naturally
> > > > > write and done. The only niche case is that the enable bits are
> > > > > defined as RW so ideally reading back those bits should get the
> > > > > latest written value. But probably this can be tolerated?
> > >
> > > Some degree of inconsistency is likely tolerated, the guest is unlikely
> > > to check that a RW bit was set or cleared.  How would we virtualize the
> > > control registers for a VF and are they similarly virtualized for a PF
> > > or would we allow the guest to manipulate the physical PASID control
> > > registers?
> >
> > it's shared so the guest shouldn't be allowed to touch the physical
> > register.
> >
> > Even for PF this is virtualized as the physical control is toggled by
> > the iommu driver today. We discussed before whether there is a
> > value moving the control to device driver but the conclusion is no.
> 
> So in both cases we virtualize the PASID bits in the vfio variant
> driver in order to maintain spec compliant behavior of the register
> (ie. the control bits are RW with no underlying hardware effect and
> capability bits only reflect the features enabled by the host in the
> control register)?

yes

> 
> > > > 4) Userspace assembles a pasid cap and inserts it to the vconfig space.
> > > >
> > > > For PF, step 1) is enough. For VF, it needs to go through all the 4 steps.
> > > > This is a bit different from what we planned at the beginning. But
> sounds
> > > > doable if we want to pursue the staging direction.
> > >
> > > Seems like if we decide that we can just expose the PASID capability
> > > for a PF then we should just have any VF variant drivers also implement
> > > a virtual PASID capability.  In this case DVSEC would only be used to
> >
> > I'm leaning toward this direction now.
> >
> > > provide information for a purely userspace emulation of PASID (in which
> > > case it also wouldn't necessarily need the vfio feature because it
> > > might implicitly know the PASID capabilities of the device).  Thanks,
> > >
> >
> > that's a good point. Then no new contract is required.
> >
> > and allowing variant driver to implement a virtual PASID capability
> > seems also make a room for making a shared variant driver to host
> > a table of virtual capabilities (both offset and content) for VFs, just
> > as discussed in patch4 having a shared driver to host a table for DVSEC?
> 
> Yes, vfio-pci-core would support virtualizing the PF PASID capability
> mapped 1:1 at the physical PASID location.  We should architect that
> support to be easily reused for a driver provided offset for the VF use
> case and then we'd need to decide if a lookup table to associate an
> offset to a VF vendor:device warrants a variant driver (which could be
> shared by multiple devices) or if we'd accept that into vfio-pci-core.

yes

> 
> > Along this route probably most vendors will just extend the table in
> > the shared driver, leading to decreased value on DVSEC and question
> > on its necessity...
> >
> > then it's back to the quirk-in-kernel approach... but if simple enough
> > probably not a bad idea to pursue? 😊
> 
> A DVSEC to express unused config space could still support a generic
> vfio-pci-core or variant driver implementation of PASID virtualization.
> The table lookup would provide a device-specific quirk to a base
> implementation of carving it from DVSEC reported free space.
> 
> The question of whether it should be in kernel or userspace is
> difficult.  There are certainly other capabilities where vfio-pci
> exposes RW registers as read-only and we rely on the userspace VMM to
> emulate them.  We could consider this one of those cases so long as the
> change of exposing PASID as a read-only capability is tolerated for old
> QEMU, new kernel.

old QEMU/new kernel seems OK according to discussions in this thread.

> 
> Then come VFs.  AFAIK, it would not be possible for an unprivileged
> QEMU to inspect the PASID state for the PF.  Therefore I think vfio
> needs to provide that information either in-band (ie. emulated PASID) or
> out-of-band (device feature).  At that point, there's some kernel code
> regardless, which leans me towards virtualizing in the kernel.  I'd
> welcome a complete, coherent proposal that could be done in userspace
> though.  Thanks,
> 

I'm not sure how userspace can fully handle this w/o certain assistance
from the kernel.

So I kind of agree that emulated PASID capability is probably the only
contract which the kernel should provide:
  - mapped 1:1 at the physical location, or
  - constructed at an offset according to DVSEC, or
  - constructed at an offset according to a look-up table

The VMM always scans the vfio pci config space to expose vPASID.

Then the remaining open is what VMM could do when a VF supports
PASID but unfortunately it's not reported by vfio. W/o the capability
of inspecting the PASID state of PF, probably the only feasible option
is to maintain a look-up table in VMM itself and assumes the kernel
always enables the PASID cap on PF.

Thanks
Kevin




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux