Re: [PATCH RFC] vfio: Revise and update the migration uAPI description

Alex Williamson <alex.williamson@xxxxxxxxxx> · Wed, 19 Jan 2022 08:32:22 -0700

On Tue, 18 Jan 2022 17:00:48 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Tue, Jan 18, 2022 at 12:55:22PM -0700, Alex Williamson wrote:
> > On Fri, 14 Jan 2022 15:35:14 -0400
> > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> >   
> > > Clarify how the migration API works by recasting its specification from a
> > > bunch of bits into a formal FSM. This describes the same functional
> > > outcome, with no practical ABI change.  
> > 
> > I don't understand why we're so reluctant to drop the previous
> > specification and call this v2.   
> 
> I won't object - but I can't say it is really necessary.
> 
> > Yes, it's clever that the enum for the FSM matches the bit
> > definitions, but we're also inserting previously invalid states as a
> > standard part of the device flow... (see below)  
> 
> This is completely invisible to userspace, if userspace never writes
> the new states to device_state it can never read them back.
> 
> > > This is RFC because the full series is not fully tested yet, that should be
> > > done next week. The series can be previewed here:
> > > 
> > >   https://github.com/jgunthorpe/linux/commits/mlx5_vfio_pci
> > > 
> > > The mlx5 implementation of the state change:
> > > 
> > >   https://github.com/jgunthorpe/linux/blob/0a6416da226fe8ee888aa8026f1e363698e137a8/drivers/vfio/pci/mlx5/main.c#L264
> > > 
> > > Has turned out very clean. Compare this to the v5 version, which is full of
> > > subtle bugs:
> > > 
> > >   https://lore.kernel.org/kvm/20211027095658.144468-12-yishaih@xxxxxxxxxx/
> > > 
> > > This patch adds the VFIO_DEVICE_MIG_ARC_SUPPORTED ioctl:
> > > 
> > >   https://github.com/jgunthorpe/linux/commit/c92eff6c2afd1ecc9ed5c67a1f81c7f270f6e940
> > > 
> > > And this shows how the Huawei driver should opt out of P2P arcs:
> > > 
> > >   https://github.com/jgunthorpe/linux/commit/dd2571c481d27546a33ff4583ce8ad49847fe300  
> > 
> > We've been bitten several times by device support that didn't come to
> > be in this whole vfio migration effort.  
> 
> Which is why this patch is for Huawei not mlx5..
> 
> > At some point later hns support is ready, it supports the migration
> > region, but migration fails with all existing userspace written to the
> > below spec.  I can't imagine that a device advertising migration, but it
> > being essentially guaranteed to fail is a viable condition and we can't
> > retroactively add this proposed ioctl to existing userspace binaries.
> > I think our recourse here would be to rev the migration sub-type again
> > so that userspace that doesn't know about devices that lack P2P won't
> > enable migration support.  
> 
> Global versions are rarely a good idea. What happens if we have three
> optional things, what do you set the version to in order to get
> maximum compatibility?
> 
> For the scenario you describe it is much better for qemu to call
> VFIO_DEVICE_MIG_ARC_SUPPORTED on every single transition it intends to
> use when it first opens the device. If any fail then it can deem the
> device as having some future ABI and refuse to use it with migration.

This misses the point of the thought experiment,
VFIO_DEVICE_MIG_ARC_SUPPORTED is not defined here, it's defined in some
adjacent commit.  The migration region sub-type is not a global
version, it's one of the means we have built in to the device specific
region API for allowing compatibility breaks.  Userspace that only
knows about v1 migration sub-types will not look at v2 sub-types.

> > So I think this ends up being a poor example of how to extend the uAPI.
> > An opt-out for part of the base specification is hard, it's much easier
> > to opt-in P2P as a feature.  
> 
> I'm not sure I understand this 'base specification'. 
> 
> My remark was how we took current qemu as an ABI added P2P to the
> specification and defined it in a way that is naturally backwards
> compatible and is still well specified.

Ok, this is backwards.  In the patch proposed here, ie. what I'm
referring to as the base specification, we're adding new states that
effectively userspace isn't allowed to use because we're trying to
subtly maintain backwards compatibility with existing userspace, but
then in a follow-on patch add a new ioctl that userspace is required to
use to validate state transitions.  Isn't the requirement to use the
new ioctl enough to demand a compatibility break?

If the order was to propose a new FSM uAPI compatible to the existing
bit definitions without the P2P states, then add a new ioctl and P2P
states, and require userspace to use the ioctl to validate support for
those new P2P states, I might be able to swallow that.

The value of maintaining compatibility with the v1 migration sub-type
is essentially nil afaict.  If we consider proprietary drivers with
existing releases that include v1 migration support and hypervisor
vendors that might ignore the experimental nature of QEMU support, I'd
just rather avoid any potential headaches.  In-kernel drivers
should expose a v2 migration sub-type based on the FSM uAPI and probing
ioctl, v1 is deprecated and dropped from QEMU.  Thanks,

Alex