On 30/09/2021 22:17, Laine Stump wrote:
On 9/30/21 1:09 PM, Laurent Vivier wrote:
If we want to save a snapshot of a VM to a file, we used to follow the
following steps:
1- stop the VM:
(qemu) stop
2- migrate the VM to a file:
(qemu) migrate "exec:cat > snapshot"
3- resume the VM:
(qemu) cont
After that we can restore the snapshot with:
qemu-system-x86_64 ... -incoming "exec:cat snapshot"
(qemu) cont
This is the basics of what libvirt does for a snapshot, and steps 1+2 are what it does for
a "managedsave" (where it saves the snapshot to disk and then terminates the qemu process,
for later re-animation).
In those cases, it seems like this new parameter could work for us - instead of explicitly
pausing the guest prior to migrating it to disk, we would set this new parameter to on,
then directly migrate-to-disk (relying on qemu to do the pause). Care will need to be
taken to assure that error recovery behaves the same though.
In case of error, the VM is restarted like it's done for a standard migration. I can
change that if you need.
An other point is the VM state sent to the migration stream is "paused", it means that
machine needs to be resumed after the stream is loaded (from the file or on destination in
the case of a real migration), but it can be also changed to be "running" so the machine
will be resumed automatically at the end of the file loading (or real migration)
There are a couple of cases when libvirt apparently *doesn't* pause the guest during the
migrate-to-disk, both having to do with saving a coredump of the guest. Since I really
have no idea of how common/important that is (or even if my assessment of the code is
correct), I'm Cc'ing this patch to libvir-list to make sure it catches the attention of
someone who knows the answers and implications.
It's an interesting point I need to test and think about: in case of a coredump I guess
the machine is crashed and doesn't answer to the unplug request and so the failover unplug
cannot be done. For the moment the migration will hang until it is canceled. IT can be
annoying if we want to debug the cause of the crash...
But when failover is configured, it doesn't work anymore.
As the failover needs to ask the guest OS to unplug the card
the machine cannot be paused.
This patch introduces a new migration parameter, "pause-vm", that
asks the migration to pause the VM during the migration startup
phase after the the card is unplugged.
Once the migration is done, we only need to resume the VM with
"cont" and the card is plugged back:
1- set the parameter:
(qemu) migrate_set_parameter pause-vm on
2- migrate the VM to a file:
(qemu) migrate "exec:cat > snapshot"
The primary failover card (VFIO) is unplugged and the VM is paused.
3- resume the VM:
(qemu) cont
The VM restarts and the primary failover card is plugged back
The VM state sent in the migration stream is "paused", it means
when the snapshot is loaded or if the stream is sent to a destination
QEMU, the VM needs to be resumed manually.
Signed-off-by: Laurent Vivier <lvivier@xxxxxxxxxx>
---
qapi/migration.json | 20 +++++++++++++++---
include/hw/virtio/virtio-net.h | 1 +
hw/net/virtio-net.c | 33 ++++++++++++++++++++++++++++++
migration/migration.c | 37 +++++++++++++++++++++++++++++++++-
monitor/hmp-cmds.c | 8 ++++++++
5 files changed, 95 insertions(+), 4 deletions(-)
...
Thanks,
Laurent