On Fri, Apr 26, 2024 at 02:13:54PM -0600, Alex Williamson wrote: > On Fri, 26 Apr 2024 11:11:17 -0300 > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > On Wed, Apr 24, 2024 at 02:13:49PM -0600, Alex Williamson wrote: > > > > > This is kind of an absurd example to portray as a ubiquitous problem. > > > Typically the config space layout is a reflection of hardware whether > > > the device supports migration or not. > > > > Er, all our HW has FW constructed config space. It changes with FW > > upgrades. We change it during the life of the product. This has to be > > considered.. > > So as I understand it, the concern is that you have firmware that > supports migration, but it also openly hostile to the fundamental > aspects of exposing a stable device ABI in support of migration. Well, that makes it sound rude, but yes that is part of it. mlx5 is tremendously FW defined. The FW can only cope with migration in some limited cases today. Making that compatability bigger is ongoing work. Config space is one of the areas that has not been addressed. Currently things are such that the FW won't support migration in combinations that have different physical config space - so it is not a problem. But, in principle, it is an issue. AFAIK, the only complete solution is for the hypervisor to fully synthesize a stable config space. So, if we keep this in the kernel then I'd imagine the kernel will need to grow some shared infrastructure to fully synthezise the config space - not text file based, but basically the same as what I described for the VMM. > > But that won't be true if the kernel is making decisions. The config > > space layout depends now on the kernel driver version too. > > But in the cases where we support migration there's a device specific > variant driver that supports that migration. It's the job of that > variant driver to not only export and import the device state, but also > to provide a consistent ABI to the user, which includes the config > space layout. Yes, we could do that, but I'm not sure how it will work in all cases. > I don't understand why we'd say the device programming ABI itself > falls within the purview of the device/variant driver, but PCI > config space is defined by device specific code at a higher level. The "device programming ABI" doesn't contain any *policy*. The layout of the config space is like 50% policy. Especially when we start to talk about standards defined migration. The standards will set the "device programming ABI" and maybe even specify the migration stream. They will not, and arguably can not, specify the config space. Config space layout is substantially policy of the instance type. Even little things like the vendor IDs can be meaningfully replaced in VMs. > Regarding "if we accept that text file configuration should be > something the VMM supports", I'm not on board with this yet, so > applying it to PASID discussion seems premature. Sure, I'm just explaining a way this could all fit together. > We've developed variant drivers specifically to host the device specific > aspects of migration support. The requirement of a consistent config > space layout is a problem that only exists relative to migration. Well, I wouldn't go quite so far. Arguably even non-migritable instance types may want to adjust thier config space. Eg if I'm using a DPU and I get a NVMe/Virtio PCI function I may want to scrub out details from the config space to make it more general. Even without migration. This already happens today in places like VDPA which completely replace the underlying config space in some cases. I see it as a difference from a world of highly constrained "instance types" and a more ad hoc world. > is an issue that I would have considered the responsibility of the > variant driver, which would likely expect a consistent interface from > the hardware/firmware. Why does hostile firmware suddenly make it the > VMM's problem to provide a consistent ABI to the config space of the > device rather than the variant driver? It is not "hostile firmware"! It accepting that a significant aspect of the config layout is actually policy. Plus the standards limitations that mean we can't change the config space on the fly make it pretty much impossible for the device to acutally do anything to help here. Software must fix the config space. > Obviously config maps are something that a VMM could do, but it also > seems to impose a non-trivial burden that every VMM requires an > implementation of a config space map and integration for each device > rather than simply expecting the exposed config space of the device to > be part of the migration ABI. Well, the flip is true to, it is alot of burden on every variant device driver implement and on the kernel in general to manage config space policy on behalf of the VMM. My point is if the VMM is already going to be forced to manage config space policy for other good reasons, are we sure we want to put a bunch of stuff in the kernel that sometimes won't be used? > Also this solution specifically only addresses config space > compatibility without considering the more generic issue that a > variant driver can expose different device personas. A versioned > persona and config space virtualization in the variant driver is a > much more flexible solution. It is addressed, the different personas would have their own text file maps. The target VMM would have to load the right map. Shared common code across all the variant drivers. Jason