On Wed, Jan 19, 2022 at 10:02:17AM -0700, Alex Williamson wrote: > > If you insist, but I'd like a good reason because I know it is going > > to hurt a bunch of people out there. ie can you point at something > > that is actually practically incompatible? > > I'm equally as mystified who is going to break by bumping the sub-type. > QEMU support is experimental and does not properly handle multiple > devices. I'm only aware of one proprietary driver that includes > migration code, but afaik it's not supported due to the status of QEMU. I do not think "not supported" is accurate > If a hypervisor vendor has chosen to run with experimental QEMU > support, it's on them to handle long term support and a transition plan > and I think that's also easier to do when it's clear whether the device > is exposing the original migration uAPI or the updated FSM model with > p2p states and an arc-supported ioctl. Thanks, I'm not sure I agree with you on this, but I don't want to get into qemu politics. So, OK, I drafted a new series that just replaces the whole v1 protocol. If we are agreed on breaking everything then I'd like to clean the other troublesome bits too, already we have some future topics on our radar that will benefit from doing this. The net result is a fairly stunning removal of ~300 lines of ugly kernel driver code, which is significant considering the whole mlx5 project is only about 1000 lines. The general gist is to stop abusing a migration region as a system call interface and instead define two new migration specific ioctls (set_state and arc_supported). Data transfer flows over a dedicated FD created for each transfer session with a clear lifecycle instead of through the region. qemu will discover the new protocol by issuing the arc_supported ioctl. (or if we prefer the other shed colour, using the VFIO_DEVICE_FEATURE ioctl instead of arc_supported) Aside from being a more unixy interface, an FD can be used with poll/io_uring/splice/etc and opens up better avenues to optimize for operating migrations of multiple devices in parallel. It kills a wack of goofy tricky driver code too. If you know some reason to be set on the using a region for this then please share, otherwise we'll look at the qemu work required to update to this and if it is managable we'll send a RFC. Thanks, Jason