On Mon, Jan 24, 2022 at 11:24:32AM +0100, Cornelia Huck wrote: > On Wed, Jan 19 2022, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > So, OK, I drafted a new series that just replaces the whole v1 > > protocol. If we are agreed on breaking everything then I'd like to > > clean the other troublesome bits too, already we have some future > > topics on our radar that will benefit from doing this. > > Can you share something about those "future topics"? It will help us > understand what you are trying to do, and maybe others might be going > into that direction as well. We are concerned that the region API has no way to notify userspace that it has data ready. We discussed this before and Alex was thinking qemu should be busy looping, but we are expecting to have many devices in a VM at any time and this seems inefficient. eg currently it looks like qemu will enter STOP_COPY serially on every device, and for something like mlx5 this means it sits around doing nothing while the snapshot is prepared. It would be better if qemu put all the devices into STOP_COPY and let them run their work in the background then use poll() to wait for data to come out. Then we can parallelize all the device steps and support a model where we the device is streaming the STOP_COPY data slower than the CPU can consume it, which we are also thinking about for a future mlx5 revision. Basically all of this is to speed up migration in for cases with multiple STOP_COPY type devices. >From what I can see qemu doesn't have the event loop infrastructure to support this in migration, but we can get the kernel side setup as part of the simplification process. > > Aside from being a more unixy interface, an FD can be used with > > poll/io_uring/splice/etc and opens up better avenues to optimize for > > operating migrations of multiple devices in parallel. It kills a wack > > of goofy tricky driver code too. > > Cleaner code certainly sounds compelling. It will be easier to review a > more concrete proposal, though, so I'll reserve judgment until then. Sure, we have patches now, just going through testing steps. A full series should be posted in the next few days, but if you want to look ahead: https://github.com/jgunthorpe/linux/commits/for-yishai We have also made the matching qemu changes. Thanks, Jason