On Fri, Apr 26, 2024 at 02:13:54PM -0600, Alex Williamson wrote:
On Fri, 26 Apr 2024 11:11:17 -0300
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
On Wed, Apr 24, 2024 at 02:13:49PM -0600, Alex Williamson wrote:
This is kind of an absurd example to portray as a ubiquitous problem.
Typically the config space layout is a reflection of hardware whether
the device supports migration or not.
Er, all our HW has FW constructed config space. It changes with FW
upgrades. We change it during the life of the product. This has to be
considered..
So as I understand it, the concern is that you have firmware that
supports migration, but it also openly hostile to the fundamental
aspects of exposing a stable device ABI in support of migration.
Well, that makes it sound rude, but yes that is part of it.
mlx5 is tremendously FW defined. The FW can only cope with migration
in some limited cases today. Making that compatability bigger is
ongoing work.
Config space is one of the areas that has not been addressed.
Currently things are such that the FW won't support migration in
combinations that have different physical config space - so it is not
a problem.
But, in principle, it is an issue. AFAIK, the only complete solution
is for the hypervisor to fully synthesize a stable config space.
So, if we keep this in the kernel then I'd imagine the kernel will
need to grow some shared infrastructure to fully synthezise the config
space - not text file based, but basically the same as what I
described for the VMM.
But that won't be true if the kernel is making decisions. The config
space layout depends now on the kernel driver version too.
But in the cases where we support migration there's a device specific
variant driver that supports that migration. It's the job of that
variant driver to not only export and import the device state, but also
to provide a consistent ABI to the user, which includes the config
space layout.
Yes, we could do that, but I'm not sure how it will work in all cases.
I don't understand why we'd say the device programming ABI itself
falls within the purview of the device/variant driver, but PCI
config space is defined by device specific code at a higher level.
The "device programming ABI" doesn't contain any *policy*. The layout
of the config space is like 50% policy. Especially when we start to
talk about standards defined migration. The standards will set the
"device programming ABI" and maybe even specify the migration
stream. They will not, and arguably can not, specify the config space.
Config space layout is substantially policy of the instance type. Even
little things like the vendor IDs can be meaningfully replaced in VMs.
Regarding "if we accept that text file configuration should be
something the VMM supports", I'm not on board with this yet, so
applying it to PASID discussion seems premature.
Sure, I'm just explaining a way this could all fit together.
We've developed variant drivers specifically to host the device specific
aspects of migration support. The requirement of a consistent config
space layout is a problem that only exists relative to migration.
Well, I wouldn't go quite so far. Arguably even non-migritable
instance types may want to adjust thier config space. Eg if I'm using
a DPU and I get a NVMe/Virtio PCI function I may want to scrub out
details from the config space to make it more general. Even without
migration.
This already happens today in places like VDPA which completely
replace the underlying config space in some cases.
I see it as a difference from a world of highly constrained "instance
types" and a more ad hoc world.
is an issue that I would have considered the responsibility of the
variant driver, which would likely expect a consistent interface from
the hardware/firmware. Why does hostile firmware suddenly make it the
VMM's problem to provide a consistent ABI to the config space of the
device rather than the variant driver?
It is not "hostile firmware"! It accepting that a significant aspect
of the config layout is actually policy.
Plus the standards limitations that mean we can't change the config
space on the fly make it pretty much impossible for the device to
acutally do anything to help here. Software must fix the config space.
Obviously config maps are something that a VMM could do, but it also
seems to impose a non-trivial burden that every VMM requires an
implementation of a config space map and integration for each device
rather than simply expecting the exposed config space of the device to
be part of the migration ABI.
Well, the flip is true to, it is alot of burden on every variant
device driver implement and on the kernel in general to manage config
space policy on behalf of the VMM.
My point is if the VMM is already going to be forced to manage config
space policy for other good reasons, are we sure we want to put a
bunch of stuff in the kernel that sometimes won't be used?
Also this solution specifically only addresses config space
compatibility without considering the more generic issue that a
variant driver can expose different device personas. A versioned
persona and config space virtualization in the variant driver is a
much more flexible solution.
It is addressed, the different personas would have their own text file
maps. The target VMM would have to load the right map. Shared common
code across all the variant drivers.
Jason