On Thu, 18 Apr 2024 17:03:15 +0800 Yi Liu <yi.l.liu@xxxxxxxxx> wrote: > On 2024/4/18 08:06, Tian, Kevin wrote: > >> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > >> Sent: Thursday, April 18, 2024 7:02 AM > >> > >> On Wed, 17 Apr 2024 09:20:51 -0300 > >> Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > >> > >>> On Wed, Apr 17, 2024 at 07:16:05AM +0000, Tian, Kevin wrote: > >>>>> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > >>>>> Sent: Wednesday, April 17, 2024 1:50 AM > >>>>> > >>>>> On Tue, Apr 16, 2024 at 08:38:50AM +0000, Tian, Kevin wrote: > >>>>>>> From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > >>>>>>> Sent: Friday, April 12, 2024 4:21 PM > >>>>>>> > >>>>>>> A userspace VMM is supposed to get the details of the device's > >> PASID > >>>>>>> capability > >>>>>>> and assemble a virtual PASID capability in a proper offset in the > >> virtual > >>>>> PCI > >>>>>>> configuration space. While it is still an open on how to get the > >> available > >>>>>>> offsets. Devices may have hidden bits that are not in the PCI cap > >> chain. > >>>>> For > >>>>>>> now, there are two options to get the available offsets.[2] > >>>>>>> > >>>>>>> - Report the available offsets via ioctl. This requires device-specific > >> logic > >>>>>>> to provide available offsets. e.g., vfio-pci variant driver. Or may the > >>>>> device > >>>>>>> provide the available offset by DVSEC. > >>>>>>> - Store the available offsets in a static table in userspace VMM. > >> VMM gets > >>>>> the > >>>>>>> empty offsets from this table. > >>>>>>> > >>>>>> > >>>>>> I'm not a fan of requesting a variant driver for every PASID-capable > >>>>>> VF just for the purpose of reporting a free range in the PCI config > >> space. > >>>>>> > >>>>>> It's easier to do that quirk in userspace. > >>>>>> > >>>>>> But I like Alex's original comment that at least for PF there is no > >> reason > >>>>>> to hide the offset. there could be a flag+field to communicate it. or > >>>>>> if there will be a new variant VF driver for other purposes e.g. > >> migration > >>>>>> it can certainly fill the field too. > >>>>> > >>>>> Yes, since this has been such a sticking point can we get a clean > >>>>> series that just enables it for PF and then come with a solution for > >>>>> VF? > >>>>> > >>>> > >>>> sure but we at least need to reach consensus on a minimal required > >>>> uapi covering both PF/VF to move forward so the user doesn't need > >>>> to touch different contracts for PF vs. VF. > >>> > >>> Do we? The situation where the VMM needs to wholly make a up a PASID > >>> capability seems completely new and seperate from just using an > >>> existing PASID capability as in the PF case. > >> > >> But we don't actually expose the PASID capability on the PF and as > >> argued in path 4/ we can't because it would break existing userspace. > > > Come back to this statement. > > > > Does 'break' means that legacy Qemu will crash due to a guest write > > to the read-only PASID capability, or just a conceptually functional > > break i.e. non-faithful emulation due to writes being dropped? I expect more the latter. > > If the latter it's probably not a bad idea to allow exposing the PASID > > capability on the PF as a sane guest shouldn't enable the PASID > > capability w/o seeing vIOMMU supporting PASID. And there is no > > status bit defined in the PASID capability to check back so even > > if an insane guest wants to blindly enable PASID it will naturally > > write and done. The only niche case is that the enable bits are > > defined as RW so ideally reading back those bits should get the > > latest written value. But probably this can be tolerated? Some degree of inconsistency is likely tolerated, the guest is unlikely to check that a RW bit was set or cleared. How would we virtualize the control registers for a VF and are they similarly virtualized for a PF or would we allow the guest to manipulate the physical PASID control registers? > > With that then should we consider exposing the PASID capability > > in PCI config space as the first option? For PF it's simple as how > > other caps are exposed. For VF a variant driver can also fake the > > PASID capability or emulate a DVSEC capability for unused space > > (to motivate the physical implementation so no variant driver is > > required in the future) > > If kernel exposes pasid cap for PF same as other caps, and in the meantime > the variant driver chooses to emulate a DVSEC cap, then userspace follows > the below steps to expose pasid cap to VM. If we have a variant driver, why wouldn't it expose an emulated PASID capability rather than a DVSEC if we're choosing to expose PASID for PFs? > > 1) Check if a pasid cap is already present in the virtual config space > read from kernel. If no, but user wants pasid, then goto step 2). > 2) Userspace invokes VFIO_DEVICE_FETURE to check if the device support > pasid cap. If yes, goto step 3). Why do we need the vfio feature interface if a physical or virtual PASID capability on the device exposes the same info? > 3) Userspace gets an available offset via reading the DVSEC cap. What's the scenario where we'd have a VF wanting to expose PASID support which doesn't also have a variant driver that could implement a virtual PASID? > 4) Userspace assembles a pasid cap and inserts it to the vconfig space. > > For PF, step 1) is enough. For VF, it needs to go through all the 4 steps. > This is a bit different from what we planned at the beginning. But sounds > doable if we want to pursue the staging direction. Seems like if we decide that we can just expose the PASID capability for a PF then we should just have any VF variant drivers also implement a virtual PASID capability. In this case DVSEC would only be used to provide information for a purely userspace emulation of PASID (in which case it also wouldn't necessarily need the vfio feature because it might implicitly know the PASID capabilities of the device). Thanks, Alex