> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Saturday, April 20, 2024 12:36 AM > > On Fri, 19 Apr 2024 05:52:01 +0000 > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > Sent: Friday, April 19, 2024 4:38 AM > > > > > > On Thu, 18 Apr 2024 17:03:15 +0800 > > > Yi Liu <yi.l.liu@xxxxxxxxx> wrote: > > > > > > > On 2024/4/18 08:06, Tian, Kevin wrote: > > > > >> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > > >> Sent: Thursday, April 18, 2024 7:02 AM > > > > >> > > > > >> But we don't actually expose the PASID capability on the PF and as > > > > >> argued in path 4/ we can't because it would break existing userspace. > > > > > > Come back to this statement. > > > > > > > > > > Does 'break' means that legacy Qemu will crash due to a guest write > > > > > to the read-only PASID capability, or just a conceptually functional > > > > > break i.e. non-faithful emulation due to writes being dropped? > > > > > > I expect more the latter. > > > > > > > > If the latter it's probably not a bad idea to allow exposing the PASID > > > > > capability on the PF as a sane guest shouldn't enable the PASID > > > > > capability w/o seeing vIOMMU supporting PASID. And there is no > > > > > status bit defined in the PASID capability to check back so even > > > > > if an insane guest wants to blindly enable PASID it will naturally > > > > > write and done. The only niche case is that the enable bits are > > > > > defined as RW so ideally reading back those bits should get the > > > > > latest written value. But probably this can be tolerated? > > > > > > Some degree of inconsistency is likely tolerated, the guest is unlikely > > > to check that a RW bit was set or cleared. How would we virtualize the > > > control registers for a VF and are they similarly virtualized for a PF > > > or would we allow the guest to manipulate the physical PASID control > > > registers? > > > > it's shared so the guest shouldn't be allowed to touch the physical > > register. > > > > Even for PF this is virtualized as the physical control is toggled by > > the iommu driver today. We discussed before whether there is a > > value moving the control to device driver but the conclusion is no. > > So in both cases we virtualize the PASID bits in the vfio variant > driver in order to maintain spec compliant behavior of the register > (ie. the control bits are RW with no underlying hardware effect and > capability bits only reflect the features enabled by the host in the > control register)? yes > > > > > 4) Userspace assembles a pasid cap and inserts it to the vconfig space. > > > > > > > > For PF, step 1) is enough. For VF, it needs to go through all the 4 steps. > > > > This is a bit different from what we planned at the beginning. But > sounds > > > > doable if we want to pursue the staging direction. > > > > > > Seems like if we decide that we can just expose the PASID capability > > > for a PF then we should just have any VF variant drivers also implement > > > a virtual PASID capability. In this case DVSEC would only be used to > > > > I'm leaning toward this direction now. > > > > > provide information for a purely userspace emulation of PASID (in which > > > case it also wouldn't necessarily need the vfio feature because it > > > might implicitly know the PASID capabilities of the device). Thanks, > > > > > > > that's a good point. Then no new contract is required. > > > > and allowing variant driver to implement a virtual PASID capability > > seems also make a room for making a shared variant driver to host > > a table of virtual capabilities (both offset and content) for VFs, just > > as discussed in patch4 having a shared driver to host a table for DVSEC? > > Yes, vfio-pci-core would support virtualizing the PF PASID capability > mapped 1:1 at the physical PASID location. We should architect that > support to be easily reused for a driver provided offset for the VF use > case and then we'd need to decide if a lookup table to associate an > offset to a VF vendor:device warrants a variant driver (which could be > shared by multiple devices) or if we'd accept that into vfio-pci-core. yes > > > Along this route probably most vendors will just extend the table in > > the shared driver, leading to decreased value on DVSEC and question > > on its necessity... > > > > then it's back to the quirk-in-kernel approach... but if simple enough > > probably not a bad idea to pursue? 😊 > > A DVSEC to express unused config space could still support a generic > vfio-pci-core or variant driver implementation of PASID virtualization. > The table lookup would provide a device-specific quirk to a base > implementation of carving it from DVSEC reported free space. > > The question of whether it should be in kernel or userspace is > difficult. There are certainly other capabilities where vfio-pci > exposes RW registers as read-only and we rely on the userspace VMM to > emulate them. We could consider this one of those cases so long as the > change of exposing PASID as a read-only capability is tolerated for old > QEMU, new kernel. old QEMU/new kernel seems OK according to discussions in this thread. > > Then come VFs. AFAIK, it would not be possible for an unprivileged > QEMU to inspect the PASID state for the PF. Therefore I think vfio > needs to provide that information either in-band (ie. emulated PASID) or > out-of-band (device feature). At that point, there's some kernel code > regardless, which leans me towards virtualizing in the kernel. I'd > welcome a complete, coherent proposal that could be done in userspace > though. Thanks, > I'm not sure how userspace can fully handle this w/o certain assistance from the kernel. So I kind of agree that emulated PASID capability is probably the only contract which the kernel should provide: - mapped 1:1 at the physical location, or - constructed at an offset according to DVSEC, or - constructed at an offset according to a look-up table The VMM always scans the vfio pci config space to expose vPASID. Then the remaining open is what VMM could do when a VF supports PASID but unfortunately it's not reported by vfio. W/o the capability of inspecting the PASID state of PF, probably the only feasible option is to maintain a look-up table in VMM itself and assumes the kernel always enables the PASID cap on PF. Thanks Kevin