On Fri, 19 Apr 2024 05:52:01 +0000 "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > Sent: Friday, April 19, 2024 4:38 AM > > > > On Thu, 18 Apr 2024 17:03:15 +0800 > > Yi Liu <yi.l.liu@xxxxxxxxx> wrote: > > > > > On 2024/4/18 08:06, Tian, Kevin wrote: > > > >> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > >> Sent: Thursday, April 18, 2024 7:02 AM > > > >> > > > >> But we don't actually expose the PASID capability on the PF and as > > > >> argued in path 4/ we can't because it would break existing userspace. > > > > > Come back to this statement. > > > > > > > > Does 'break' means that legacy Qemu will crash due to a guest write > > > > to the read-only PASID capability, or just a conceptually functional > > > > break i.e. non-faithful emulation due to writes being dropped? > > > > I expect more the latter. > > > > > > If the latter it's probably not a bad idea to allow exposing the PASID > > > > capability on the PF as a sane guest shouldn't enable the PASID > > > > capability w/o seeing vIOMMU supporting PASID. And there is no > > > > status bit defined in the PASID capability to check back so even > > > > if an insane guest wants to blindly enable PASID it will naturally > > > > write and done. The only niche case is that the enable bits are > > > > defined as RW so ideally reading back those bits should get the > > > > latest written value. But probably this can be tolerated? > > > > Some degree of inconsistency is likely tolerated, the guest is unlikely > > to check that a RW bit was set or cleared. How would we virtualize the > > control registers for a VF and are they similarly virtualized for a PF > > or would we allow the guest to manipulate the physical PASID control > > registers? > > it's shared so the guest shouldn't be allowed to touch the physical > register. > > Even for PF this is virtualized as the physical control is toggled by > the iommu driver today. We discussed before whether there is a > value moving the control to device driver but the conclusion is no. So in both cases we virtualize the PASID bits in the vfio variant driver in order to maintain spec compliant behavior of the register (ie. the control bits are RW with no underlying hardware effect and capability bits only reflect the features enabled by the host in the control register)? > > > 4) Userspace assembles a pasid cap and inserts it to the vconfig space. > > > > > > For PF, step 1) is enough. For VF, it needs to go through all the 4 steps. > > > This is a bit different from what we planned at the beginning. But sounds > > > doable if we want to pursue the staging direction. > > > > Seems like if we decide that we can just expose the PASID capability > > for a PF then we should just have any VF variant drivers also implement > > a virtual PASID capability. In this case DVSEC would only be used to > > I'm leaning toward this direction now. > > > provide information for a purely userspace emulation of PASID (in which > > case it also wouldn't necessarily need the vfio feature because it > > might implicitly know the PASID capabilities of the device). Thanks, > > > > that's a good point. Then no new contract is required. > > and allowing variant driver to implement a virtual PASID capability > seems also make a room for making a shared variant driver to host > a table of virtual capabilities (both offset and content) for VFs, just > as discussed in patch4 having a shared driver to host a table for DVSEC? Yes, vfio-pci-core would support virtualizing the PF PASID capability mapped 1:1 at the physical PASID location. We should architect that support to be easily reused for a driver provided offset for the VF use case and then we'd need to decide if a lookup table to associate an offset to a VF vendor:device warrants a variant driver (which could be shared by multiple devices) or if we'd accept that into vfio-pci-core. > Along this route probably most vendors will just extend the table in > the shared driver, leading to decreased value on DVSEC and question > on its necessity... > > then it's back to the quirk-in-kernel approach... but if simple enough > probably not a bad idea to pursue? 😊 A DVSEC to express unused config space could still support a generic vfio-pci-core or variant driver implementation of PASID virtualization. The table lookup would provide a device-specific quirk to a base implementation of carving it from DVSEC reported free space. The question of whether it should be in kernel or userspace is difficult. There are certainly other capabilities where vfio-pci exposes RW registers as read-only and we rely on the userspace VMM to emulate them. We could consider this one of those cases so long as the change of exposing PASID as a read-only capability is tolerated for old QEMU, new kernel. Then come VFs. AFAIK, it would not be possible for an unprivileged QEMU to inspect the PASID state for the PF. Therefore I think vfio needs to provide that information either in-band (ie. emulated PASID) or out-of-band (device feature). At that point, there's some kernel code regardless, which leans me towards virtualizing in the kernel. I'd welcome a complete, coherent proposal that could be done in userspace though. Thanks, Alex