Re: [PATCH v2 0/4] vfio-pci support pasid attach/detach

Jason Gunthorpe <jgg@xxxxxxxxxx> · Mon, 29 Apr 2024 14:15:15 -0300

On Sun, Apr 28, 2024 at 06:19:29AM +0000, Tian, Kevin wrote:
> > From: Alex Williamson <alex.williamson@xxxxxxxxxx>
> > Sent: Saturday, April 27, 2024 4:14 AM
> > 
> > On Fri, 26 Apr 2024 11:11:17 -0300
> > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> > 
> > > On Wed, Apr 24, 2024 at 02:13:49PM -0600, Alex Williamson wrote:
> > >
> > > > This is kind of an absurd example to portray as a ubiquitous problem.
> > > > Typically the config space layout is a reflection of hardware whether
> > > > the device supports migration or not.
> > >
> > > Er, all our HW has FW constructed config space. It changes with FW
> > > upgrades. We change it during the life of the product. This has to be
> > > considered..
> > 
> > So as I understand it, the concern is that you have firmware that
> > supports migration, but it also openly hostile to the fundamental
> > aspects of exposing a stable device ABI in support of migration.
> > 
> > > > If a driver were to insert a
> > > > virtual capability, then yes it would want to be consistent about it if
> > > > it also cares about migration.  If the driver needs to change the
> > > > location of a virtual capability, problems will arise, but that's also
> > > > not something that every driver needs to do.
> > >
> > > Well, mlx5 has to cope with this. It supports so many devices with so
> > > many config space layouts :( I don't know if we can just hard wire an
> > > offset to stick in a PASID cap and expect that to work...
> 
> Are those config space layout differences usually also coming with
> mmio-side interface change? 

Not really

> If yes there are more to handle for
> running V1 instance on V2 device and it'd make sense to manage
> everything about compatibility in one place.

It's complicated :|

The config space layout can't change once the device is discovered by
the OS and I have a feeling it can't differ between VFs in a SRIOV.

So even if we did say that a device HW/FW will change it's personality
on the fly, I'm not sure we actually can and still conform to the
PCI specs?

> If we pursue the direction deciding the vconfig layout in VMM, does
> it imply that anything related to mmio layout would also be put in
> VMM too?

I'd guess no, MMIO doesn't have the OS and standards entanglements. If
the FW can reprogram the MMIO layout then when it loads the migration
state, or provisions the device, it should put the MMIO to the right
configuration.

> Do we envision things like above in the variant driver or in VMM?

In the device itself. I think it would be an extreme case for someone
to make a device that had a different MMIO layout that could be
backwards compatible to an old layout and NOT provide a way for the
device HW to go back to the old layout directly. If so that device
would have to mediate MMIO in the VMM or kernel. Depends on the
complexity I suppose where to do it.

> > > Consider standards based migration between wildly different
> > > devices. The devices will not standardize their physical config space,
> > > but an operator could generate a consistent vPCI config space that
> > > works with all the devices in their fleet.
> 
> It's hard to believe that 'wildly different' devices only have difference
> in the layout of vPCI config space. 

Well, I mean a device from vendor A and another device from vendor
B. They don't have anything in common except implementing the
standard.

The standard perscribes the MMIO layout. Both devices have a way to
accept a migration stream defined by a standard. The MMIO will be
adjusted as the standard requires after loading the migration stream.

Config space primiarily remains a problem in this imagined world.

My best view is that it needs to be solved by config space synthesis
in the hypervisor and not via HW/FW on the physical device.

> and looks this community lacks of a clear criteria on what burden
> should be put in the kernel vs. in the VMM.

Right, we've always discussed this. I've fairly consistently wanted to
push things to userspace, principally for security.

> e.g. in earlier nvgrace-gpu discussion a major open was whether
> the PCI bar emulation should be done by the variant driver or
> by the VMM (with variant driver providing a device feature).

Sort of, this was actually also about config space synthesis. If we
had this kind of scheme then perhaps the argument would be different.

> It ends up to be in the variant driver with one major argument
> that doing so avoids the burden in various VMMs.

Yes, that has always been the argument that has won out.

> btw while this discussion may continue some time, I wonder whether
> this vPASID reporting open can be handled separately from the
> pasid attach/detach series so we can move the ball and merge
> something already in agreement. anyway it's just a read-only cap so
> won't affect how VFIO/IOMMUFD handles the pasid related requests.

Yes, this makes sense to me. Ultimately we all agree the config space
will have to be synthesized.

Jason