> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Thursday, April 18, 2024 4:26 AM > > On Wed, 17 Apr 2024 07:09:52 +0000 > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > Sent: Wednesday, April 17, 2024 1:57 AM > > > > > > On Fri, 12 Apr 2024 01:21:21 -0700 > > > Yi Liu <yi.l.liu@xxxxxxxxx> wrote: > > > > > > > + */ > > > > +struct vfio_device_feature_pasid { > > > > + __u16 capabilities; > > > > +#define VFIO_DEVICE_PASID_CAP_EXEC (1 << 0) > > > > +#define VFIO_DEVICE_PASID_CAP_PRIV (1 << 1) > > > > + __u8 width; > > > > + __u8 __reserved; > > > > +}; > > > > > > Building on Kevin's comment on the cover letter, if we could describe > > > an offset for emulating a PASID capability, this seems like the place > > > we'd do it. I think we're not doing that because we'd like an in-band > > > mechanism for a device to report unused config space, such as a DVSEC > > > capability, so that it can be implemented on a physical device. As > > > noted in the commit log here, we'd also prefer not to bloat the kernel > > > with more device quirks. > > > > > > In an ideal world we might be able to jump start support of that DVSEC > > > option by emulating the DVSEC capability on top of the PASID capability > > > for PFs, but unfortunately the PASID capability is 8 bytes while the > > > DVSEC capability is at least 12 bytes, so we can't implement that > > > generically either. > > > > Yeah, that's a problem. > > > > > > > > I don't know there's any good solution here or whether there's actually > > > any value to the PASID capability on a PF, but do we need to consider > > > leaving a field+flag here to describe the offset for that scenario? > > > > Yes, I prefer to this way. > > > > > Would we then allow variant drivers to take advantage of it? Does this > > > then turn into the quirk that we're trying to avoid in the kernel > > > rather than userspace and is that a problem? Thanks, > > > > > > > We don't want to proactively pursue quirks in the kernel. > > > > But if a variant driver exists for other reasons, I don't see why it > > should be prohibited from deciding an offset to ease the > > userspace. 😊 > > At that point we've turned the corner into an arbitrary policy decision > that I can't defend. A "worthy" variant driver can implement something > through a side channel vfio API, but implementing that side channel > itself is not enough to justify a variant driver? It doesn't make > sense. > > Further, if we have a variant driver, why do we need a side channel for > the purpose of describing available config space when we expect devices > themselves to eventually describe the same through a DVSEC capability? > The purpose of enabling variant drivers is to enhance the functionality > of the device. Adding an emulated DVSEC capability seems like a valid > enhancement to justify a variant driver to me. > > So the more I think about it, it would be easy to add something here > that hints a location for an emulated PASID capability in the VMM, but > it would also be counterproductive to an end goal of having a DVSEC > capability that describes unused config space. The very narrow scope > where that side-band channel would be useful is an unknown PF device > which doesn't implement a DVSEC capability and without intervention > simply behaves as it always has, without PASID support. > > A vendor desiring such support can a) implement DVSEC in the hardware, > b) implement a variant driver emulating a DVSEC capability, or c) > directly modify the VMM to tell it where to place the PASID capability. > I also don't think we should exclude the possibility that b) could turn > into a shared variant driver that knows about multiple devices and has > a table of free config space for each. Option c) is only the last > resort if there's not already 12 bytes of contiguous, aligned free > space to place a DVSEC capability. That seems unlikely. or b) could be a table in vfio_pci_config.c i.e. kind of making vfio-pci as the shared variant driver. > > At some point we need to define the format and use of this DVSEC. Do > we allow (not require) one at every gap in config space that's at least > 12-bytes long and adjust the DVSEC Length to describe longer gaps, or do Does PCI spec allows multiple same-type capabilities co-existing? > we use a single DVSEC to describe a table of ranges throughout extended > (maybe even conventional) config space? The former seems easier, this might be challenging as the table itself requires a contiguous large free block. > especially if we expect a device has a large block of free space, > enough for multiple emulated capabilities and described by a single > DVSEC. Thanks, > yes that sounds simpler.