On Wed, Apr 24, 2024 at 12:38:51PM -0600, Alex Williamson wrote: > On Wed, 24 Apr 2024 11:15:25 -0300 > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > On Wed, Apr 24, 2024 at 05:19:31AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > > Sent: Wednesday, April 24, 2024 8:12 AM > > > > > > > > On Tue, Apr 23, 2024 at 11:47:50PM +0000, Tian, Kevin wrote: > > > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > > > > Sent: Tuesday, April 23, 2024 8:02 PM > > > > > > > > > > > > It feels simpler if the indicates if PASID and ATS can be supported > > > > > > and userspace builds the capability blocks. > > > > > > > > > > this routes back to Alex's original question about using different > > > > > interfaces (a device feature vs. PCI PASID cap) for VF and PF. > > > > > > > > I'm not sure it is different interfaces.. > > > > > > > > The only reason to pass the PF's PASID cap is to give free space to > > > > the VMM. If we are saying that gaps are free space (excluding a list > > > > of bad devices) then we don't acutally need to do that anymore. > > > > > > > > VMM will always create a synthetic PASID cap and kernel will always > > > > suppress a real one. > > > > > > oh you suggest that there won't even be a 1:1 map for PF! > > > > Right. No real need.. > > > > > kind of continue with the device_feature method as this series does. > > > and it could include all VMM-emulated capabilities which are not > > > enumerated properly from vfio pci config space. > > > > 1) VFIO creates the iommufd idev > > 2) VMM queries IOMMUFD_CMD_GET_HW_INFO to learn if PASID, PRI, etc, > > etc is supported > > 3) VMM locates empty space in the config space > > 4) VMM figures out where and what cap blocks to create (considering > > migration needs/etc) > > 5) VMM synthesizes the blocks and ties emulation to other iommufd things > > > > This works generically for any synthetic vPCI function including a > > non-vfio-pci one. > > Maybe this is the actual value in implementing this in the VMM, one > implementation can support multiple device interfaces. > > > Most likely due to migration needs the exact layout of the PCI config > > space should be configured to the VMM, including the location of any > > blocks copied from physical and any blocks synthezied. This is the > > only way to be sure the config space is actually 100% consistent. > > Where is this concern about config space arbitrarily changing coming > from? It is important for migration. Today with the drivers we have the devices have to take care of their own config space layout only in HW. We are not expecting migration drivers to worry about any SW created config space. SW config space is a new thing. > > We may also want a DVSEC to indicate free space - but if vendors are > > going to change their devices I'd rather them change to mark the used > > space with DVSEC then mark the free space :) > > Sure, had we proposed this and had vendor buy-in 10+yrs ago, that'd be > great We are applying this going forward to PASID, and PRI, which barely exist on devices at all today. We don't need to worry about those 10+yr old devices that don't support PASID/PRI in the first place. I think this makes the problem much smaller. Jason