On Thu, Sep 30, 2021 at 04:17:44PM -0400, Laine Stump wrote: > On 9/30/21 1:09 PM, Laurent Vivier wrote: > > If we want to save a snapshot of a VM to a file, we used to follow the > > following steps: > > > > 1- stop the VM: > > (qemu) stop > > > > 2- migrate the VM to a file: > > (qemu) migrate "exec:cat > snapshot" > > > > 3- resume the VM: > > (qemu) cont > > > > After that we can restore the snapshot with: > > qemu-system-x86_64 ... -incoming "exec:cat snapshot" > > (qemu) cont > > This is the basics of what libvirt does for a snapshot, and steps 1+2 are > what it does for a "managedsave" (where it saves the snapshot to disk and > then terminates the qemu process, for later re-animation). > > In those cases, it seems like this new parameter could work for us - instead > of explicitly pausing the guest prior to migrating it to disk, we would set > this new parameter to on, then directly migrate-to-disk (relying on qemu to > do the pause). Care will need to be taken to assure that error recovery > behaves the same though. What libvirt does is actually quite different from this in a signficant way. In the HMP example here 'migrate' is a blocking command that does not return until migration is finished. Libvirt uses QMP and 'migrate' there is a asynchronous command that merely launches the migration and returns control to the client. IOW, what libvirt does is stop migrate while status != failed || completed query-migrate ...also receive any QMP migration events... ...possibly modify migration parameters... cont With this pattern I'm not seeing any need for a new migration parameter for libvirt. The migration status lets us distinguish when QEMU is in the "waiting for unplug" phase vs the "active" phase. So AFAICT, libvirt can do: migrate while status != failed || completed query-migrate ...also receive any QMP migration events.. if status changed wait-for-unplug to active stop ...possibly modify migration parameters... cont There is a small window here when the guest CPUs are running but migration is active. In most cases for libvirt that is harmless. If there are cases where libvirt needs a strong guarantee to synchonize the 'stop' with some other option, then the new proposed "pause-vm" parameter as the same problem as libvirt can't sychronize against that either. > There are a couple of cases when libvirt apparently *doesn't* pause the > guest during the migrate-to-disk, both having to do with saving a coredump > of the guest. Since I really have no idea of how common/important that is > (or even if my assessment of the code is correct), I'm Cc'ing this patch to > libvir-list to make sure it catches the attention of someone who knows the > answers and implications. IIUC, the problem with unplug only happens when libvirt pauses the guest. So surely if there are some scenarios where we're not pausing the guest, there's no problem to solve for those. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|