On Thu, 18 Apr 2024 00:21:36 +0000 "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > Sent: Thursday, April 18, 2024 4:26 AM > > > > On Wed, 17 Apr 2024 07:09:52 +0000 > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > > > From: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > > Sent: Wednesday, April 17, 2024 1:57 AM > > > > > > > > On Fri, 12 Apr 2024 01:21:21 -0700 > > > > Yi Liu <yi.l.liu@xxxxxxxxx> wrote: > > > > > > > > > + */ > > > > > +struct vfio_device_feature_pasid { > > > > > + __u16 capabilities; > > > > > +#define VFIO_DEVICE_PASID_CAP_EXEC (1 << 0) > > > > > +#define VFIO_DEVICE_PASID_CAP_PRIV (1 << 1) > > > > > + __u8 width; > > > > > + __u8 __reserved; > > > > > +}; > > > > > > > > Building on Kevin's comment on the cover letter, if we could describe > > > > an offset for emulating a PASID capability, this seems like the place > > > > we'd do it. I think we're not doing that because we'd like an in-band > > > > mechanism for a device to report unused config space, such as a DVSEC > > > > capability, so that it can be implemented on a physical device. As > > > > noted in the commit log here, we'd also prefer not to bloat the kernel > > > > with more device quirks. > > > > > > > > In an ideal world we might be able to jump start support of that DVSEC > > > > option by emulating the DVSEC capability on top of the PASID capability > > > > for PFs, but unfortunately the PASID capability is 8 bytes while the > > > > DVSEC capability is at least 12 bytes, so we can't implement that > > > > generically either. > > > > > > Yeah, that's a problem. > > > > > > > > > > > I don't know there's any good solution here or whether there's actually > > > > any value to the PASID capability on a PF, but do we need to consider > > > > leaving a field+flag here to describe the offset for that scenario? > > > > > > Yes, I prefer to this way. > > > > > > > Would we then allow variant drivers to take advantage of it? Does this > > > > then turn into the quirk that we're trying to avoid in the kernel > > > > rather than userspace and is that a problem? Thanks, > > > > > > > > > > We don't want to proactively pursue quirks in the kernel. > > > > > > But if a variant driver exists for other reasons, I don't see why it > > > should be prohibited from deciding an offset to ease the > > > userspace. 😊 > > > > At that point we've turned the corner into an arbitrary policy decision > > that I can't defend. A "worthy" variant driver can implement something > > through a side channel vfio API, but implementing that side channel > > itself is not enough to justify a variant driver? It doesn't make > > sense. > > > > Further, if we have a variant driver, why do we need a side channel for > > the purpose of describing available config space when we expect devices > > themselves to eventually describe the same through a DVSEC capability? > > The purpose of enabling variant drivers is to enhance the functionality > > of the device. Adding an emulated DVSEC capability seems like a valid > > enhancement to justify a variant driver to me. > > > > So the more I think about it, it would be easy to add something here > > that hints a location for an emulated PASID capability in the VMM, but > > it would also be counterproductive to an end goal of having a DVSEC > > capability that describes unused config space. The very narrow scope > > where that side-band channel would be useful is an unknown PF device > > which doesn't implement a DVSEC capability and without intervention > > simply behaves as it always has, without PASID support. > > > > A vendor desiring such support can a) implement DVSEC in the hardware, > > b) implement a variant driver emulating a DVSEC capability, or c) > > directly modify the VMM to tell it where to place the PASID capability. > > I also don't think we should exclude the possibility that b) could turn > > into a shared variant driver that knows about multiple devices and has > > a table of free config space for each. Option c) is only the last > > resort if there's not already 12 bytes of contiguous, aligned free > > space to place a DVSEC capability. That seems unlikely. > > or b) could be a table in vfio_pci_config.c i.e. kind of making vfio-pci > as the shared variant driver. We've kind of made the statement that variant drivers should be used for any device specific quirks rather than further extending vfio-pci. That's not to say that there couldn't be a shared variant driver that binds to devices across vendors with device specific knowledge to insert a DVSEC capability. > > At some point we need to define the format and use of this DVSEC. Do > > we allow (not require) one at every gap in config space that's at least > > 12-bytes long and adjust the DVSEC Length to describe longer gaps, or do > > Does PCI spec allows multiple same-type capabilities co-existing? As Yi notes, yes. > > we use a single DVSEC to describe a table of ranges throughout extended > > (maybe even conventional) config space? The former seems easier, > > this might be challenging as the table itself requires a contiguous > large free block. Yes, but DVSEC is an extended capability, so we have a fair bit of address space to work with and the table could always collapse to zero entries to indicate only the DVSEC capability itself is available, so it's really no different in minimum described size to the other approach. Thanks, Alex > > especially if we expect a device has a large block of free space, > > enough for multiple emulated capabilities and described by a single > > DVSEC. Thanks, > > > > yes that sounds simpler.