Re: [PATCH v2 0/4] vfio-pci support pasid attach/detach

Jason Gunthorpe <jgg@xxxxxxxxxx> · Mon, 29 Apr 2024 14:44:42 -0300

On Fri, Apr 26, 2024 at 02:13:54PM -0600, Alex Williamson wrote:
> On Fri, 26 Apr 2024 11:11:17 -0300
> Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> 
> > On Wed, Apr 24, 2024 at 02:13:49PM -0600, Alex Williamson wrote:
> > 
> > > This is kind of an absurd example to portray as a ubiquitous problem.
> > > Typically the config space layout is a reflection of hardware whether
> > > the device supports migration or not.  
> > 
> > Er, all our HW has FW constructed config space. It changes with FW
> > upgrades. We change it during the life of the product. This has to be
> > considered..
> 
> So as I understand it, the concern is that you have firmware that
> supports migration, but it also openly hostile to the fundamental
> aspects of exposing a stable device ABI in support of migration.

Well, that makes it sound rude, but yes that is part of it.

mlx5 is tremendously FW defined. The FW can only cope with migration
in some limited cases today. Making that compatability bigger is
ongoing work.

Config space is one of the areas that has not been addressed.
Currently things are such that the FW won't support migration in
combinations that have different physical config space - so it is not
a problem.

But, in principle, it is an issue. AFAIK, the only complete solution
is for the hypervisor to fully synthesize a stable config space.

So, if we keep this in the kernel then I'd imagine the kernel will
need to grow some shared infrastructure to fully synthezise the config
space - not text file based, but basically the same as what I
described for the VMM.

> > But that won't be true if the kernel is making decisions. The config
> > space layout depends now on the kernel driver version too.
> 
> But in the cases where we support migration there's a device specific
> variant driver that supports that migration.  It's the job of that
> variant driver to not only export and import the device state, but also
> to provide a consistent ABI to the user, which includes the config
> space layout. 

Yes, we could do that, but I'm not sure how it will work in all cases.

> I don't understand why we'd say the device programming ABI itself
> falls within the purview of the device/variant driver, but PCI
> config space is defined by device specific code at a higher level.

The "device programming ABI" doesn't contain any *policy*. The layout
of the config space is like 50% policy. Especially when we start to
talk about standards defined migration. The standards will set the
"device programming ABI" and maybe even specify the migration
stream. They will not, and arguably can not, specify the config space.

Config space layout is substantially policy of the instance type. Even
little things like the vendor IDs can be meaningfully replaced in VMs.

> Regarding "if we accept that text file configuration should be
> something the VMM supports", I'm not on board with this yet, so
> applying it to PASID discussion seems premature.

Sure, I'm just explaining a way this could all fit together.

> We've developed variant drivers specifically to host the device specific
> aspects of migration support.  The requirement of a consistent config
> space layout is a problem that only exists relative to migration.  

Well, I wouldn't go quite so far. Arguably even non-migritable
instance types may want to adjust thier config space. Eg if I'm using
a DPU and I get a NVMe/Virtio PCI function I may want to scrub out
details from the config space to make it more general. Even without
migration.

This already happens today in places like VDPA which completely
replace the underlying config space in some cases.

I see it as a difference from a world of highly constrained "instance
types" and a more ad hoc world.

> is an issue that I would have considered the responsibility of the
> variant driver, which would likely expect a consistent interface from
> the hardware/firmware.  Why does hostile firmware suddenly make it the
> VMM's problem to provide a consistent ABI to the config space of the
> device rather than the variant driver?

It is not "hostile firmware"! It accepting that a significant aspect
of the config layout is actually policy.

Plus the standards limitations that mean we can't change the config
space on the fly make it pretty much impossible for the device to
acutally do anything to help here. Software must fix the config space.

> Obviously config maps are something that a VMM could do, but it also
> seems to impose a non-trivial burden that every VMM requires an
> implementation of a config space map and integration for each device
> rather than simply expecting the exposed config space of the device to
> be part of the migration ABI.  

Well, the flip is true to, it is alot of burden on every variant
device driver implement and on the kernel in general to manage config
space policy on behalf of the VMM.

My point is if the VMM is already going to be forced to manage config
space policy for other good reasons, are we sure we want to put a
bunch of stuff in the kernel that sometimes won't be used?

> Also this solution specifically only addresses config space
> compatibility without considering the more generic issue that a
> variant driver can expose different device personas.  A versioned
> persona and config space virtualization in the variant driver is a
> much more flexible solution.

It is addressed, the different personas would have their own text file
maps. The target VMM would have to load the right map. Shared common
code across all the variant drivers.

Jason