On Tue, Jan 18 2022, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Tue, Jan 18, 2022 at 12:55:22PM -0700, Alex Williamson wrote: >> At some point later hns support is ready, it supports the migration >> region, but migration fails with all existing userspace written to the >> below spec. I can't imagine that a device advertising migration, but it >> being essentially guaranteed to fail is a viable condition and we can't >> retroactively add this proposed ioctl to existing userspace binaries. >> I think our recourse here would be to rev the migration sub-type again >> so that userspace that doesn't know about devices that lack P2P won't >> enable migration support. > > Global versions are rarely a good idea. What happens if we have three > optional things, what do you set the version to in order to get > maximum compatibility? > > For the scenario you describe it is much better for qemu to call > VFIO_DEVICE_MIG_ARC_SUPPORTED on every single transition it intends to > use when it first opens the device. If any fail then it can deem the > device as having some future ABI and refuse to use it with migration. Userspace having to discover piecemeal what is and what isn't supported does not sound like a very good idea. It should be able to figure that out in one go. > >> So I think this ends up being a poor example of how to extend the uAPI. >> An opt-out for part of the base specification is hard, it's much easier >> to opt-in P2P as a feature. > > I'm not sure I understand this 'base specification'. > > My remark was how we took current qemu as an ABI added P2P to the > specification and defined it in a way that is naturally backwards > compatible and is still well specified. I agree with Alex that this approach, while clever, is not a good way to extend the uapi. What about leaving the existing migration region alone (in order to not break whatever exists out there) and add a v2 migration region that defines a base specification (the mandatory part that everyone must support) and a capability mechanism to allow for extensions like P2P? The base specification should really only contain what everybody can and will need to implement; if we know that mlx5 will need more, we simply need to define those additional features right from the start. (I do not object to using a FSM for describing the state transitions; I have not reviewed it so far.)