On Tue, 18 Jan 2022 17:00:48 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Tue, Jan 18, 2022 at 12:55:22PM -0700, Alex Williamson wrote: > > On Fri, 14 Jan 2022 15:35:14 -0400 > > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > > > Clarify how the migration API works by recasting its specification from a > > > bunch of bits into a formal FSM. This describes the same functional > > > outcome, with no practical ABI change. > > > > I don't understand why we're so reluctant to drop the previous > > specification and call this v2. > > I won't object - but I can't say it is really necessary. > > > Yes, it's clever that the enum for the FSM matches the bit > > definitions, but we're also inserting previously invalid states as a > > standard part of the device flow... (see below) > > This is completely invisible to userspace, if userspace never writes > the new states to device_state it can never read them back. > > > > This is RFC because the full series is not fully tested yet, that should be > > > done next week. The series can be previewed here: > > > > > > https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci > > > > > > The mlx5 implementation of the state change: > > > > > > https://github.com/jgunthorpe/linux/blob/0a6416da226fe8ee888aa8026f1e363698e137a8/drivers/vfio/pci/mlx5/main.c#L264 > > > > > > Has turned out very clean. Compare this to the v5 version, which is full of > > > subtle bugs: > > > > > > https://lore.kernel.org/kvm/20211027095658.144468-12-yishaih@xxxxxxxxxx/ > > > > > > This patch adds the VFIO_DEVICE_MIG_ARC_SUPPORTED ioctl: > > > > > > https://github.com/jgunthorpe/linux/commit/c92eff6c2afd1ecc9ed5c67a1f81c7f270f6e940 > > > > > > And this shows how the Huawei driver should opt out of P2P arcs: > > > > > > https://github.com/jgunthorpe/linux/commit/dd2571c481d27546a33ff4583ce8ad49847fe300 > > > > We've been bitten several times by device support that didn't come to > > be in this whole vfio migration effort. > > Which is why this patch is for Huawei not mlx5.. > > > At some point later hns support is ready, it supports the migration > > region, but migration fails with all existing userspace written to the > > below spec. I can't imagine that a device advertising migration, but it > > being essentially guaranteed to fail is a viable condition and we can't > > retroactively add this proposed ioctl to existing userspace binaries. > > I think our recourse here would be to rev the migration sub-type again > > so that userspace that doesn't know about devices that lack P2P won't > > enable migration support. > > Global versions are rarely a good idea. What happens if we have three > optional things, what do you set the version to in order to get > maximum compatibility? > > For the scenario you describe it is much better for qemu to call > VFIO_DEVICE_MIG_ARC_SUPPORTED on every single transition it intends to > use when it first opens the device. If any fail then it can deem the > device as having some future ABI and refuse to use it with migration. This misses the point of the thought experiment, VFIO_DEVICE_MIG_ARC_SUPPORTED is not defined here, it's defined in some adjacent commit. The migration region sub-type is not a global version, it's one of the means we have built in to the device specific region API for allowing compatibility breaks. Userspace that only knows about v1 migration sub-types will not look at v2 sub-types. > > So I think this ends up being a poor example of how to extend the uAPI. > > An opt-out for part of the base specification is hard, it's much easier > > to opt-in P2P as a feature. > > I'm not sure I understand this 'base specification'. > > My remark was how we took current qemu as an ABI added P2P to the > specification and defined it in a way that is naturally backwards > compatible and is still well specified. Ok, this is backwards. In the patch proposed here, ie. what I'm referring to as the base specification, we're adding new states that effectively userspace isn't allowed to use because we're trying to subtly maintain backwards compatibility with existing userspace, but then in a follow-on patch add a new ioctl that userspace is required to use to validate state transitions. Isn't the requirement to use the new ioctl enough to demand a compatibility break? If the order was to propose a new FSM uAPI compatible to the existing bit definitions without the P2P states, then add a new ioctl and P2P states, and require userspace to use the ioctl to validate support for those new P2P states, I might be able to swallow that. The value of maintaining compatibility with the v1 migration sub-type is essentially nil afaict. If we consider proprietary drivers with existing releases that include v1 migration support and hypervisor vendors that might ignore the experimental nature of QEMU support, I'd just rather avoid any potential headaches. In-kernel drivers should expose a v2 migration sub-type based on the FSM uAPI and probing ioctl, v1 is deprecated and dropped from QEMU. Thanks, Alex