Re: managedsave results in unexpected shutdown from inside Windows

Eric Blake <eblake@xxxxxxxxxx> · Fri, 15 Mar 2013 10:40:40 -0600

On 03/15/2013 06:17 AM, Nicolas Sebrecht wrote:
> Here are the basics steps. This is still not that simple and there are
> tricky parts in the way.
> 
> Usual workflow (use case 2)
> ===========================
> 
> Step 1: create external snapshot for all VM disks (includes VM state).
> Step 2: do the backups manually while the VM is still running (original disks and memory state).
> Step 3: save and halt the vm state once backups are finished.
> Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file.
> Step 5: start the VM.

This involves guest downtime, longer according to how much state changed
since the snapshot.

> 
> Restarting from the backup (use case 1)
> =======================================
> 
> Step A: shutdown the running VM and move it out the way.
> Step B: restore the backing files and state file from the archives of step 2.
> Step C: restore the VM. (still not sure on that one, see below)
> 
> I wish to provide a more detailed procedure in the future.
> 
> 
>> With new enough libvirt and qemu, it is also possible to use 'virsh
>> blockcopy' instead of snapshots as a backup mechanism, and THAT works
>> with raw images without forcing your VM to use qcow2.  But right now, it
>> only works with transient guests (getting it to work for persistent
>> guests requires a persistent bitmap feature that has been proposed for
>> qemu 1.5, along with more libvirt work to take advantage of persistent
>> bitmaps).
> 
> Fine. Sadly, my guests are not transient.

Guests can be made temporarily transient.  That is, the following
sequence has absolute minimal guest downtime, and can be done without
any qcow2 files in the mix.  For a guest with a single disk, there is
ZERO! downtime:

virsh dumpxml --security-info dom > dom.xml
virsh undefine dom
virsh blockcopy dom vda /path/to/backup --wait --verbose --finish
virsh define dom.xml

For a guest with multiple disks, the downtime can be sub-second, if you
script things correctly (the downtime lasts for the duration between the
suspend and resume, but the steps done in that time are all fast):

virsh dumpxml --security-info dom > dom.xml
virsh undefine dom
virsh blockcopy dom vda /path/to/backup-vda
virsh blockcopy dom vdb /path/to/backup-vdb
polling loop - check periodically until 'virsh blockjob dom vda' and
'virsh blockjob dom vdb' both show 100% completion
virsh suspend dom
virsh blockjob dom vda --abort
virsh blockjob dom vdb --abort
virsh resume dom
virsh define dom.xml

In other words, 'blockcopy' is my current preferred method of online
guest backup, even though I'm still waiting for qemu improvements to
make it even nicer.

> It appears I'm in worst case for all options.  :-)

Not if you don't mind being temporarily transient.

> 
>> There's also a proposal on the qemu lists to add a block-backup job,
>> which I would need to expose in libvirt, which has even nicer backup
>> semantics than blockcopy, and does not need a persistent bitmap.
> 
> Ok.

For that, I will probably be adding a 'virsh blockbackup dom vda' command.

> 
>>> Surprising! I would have expect files to be stored in virtuals/images. This is
>>> not the point for now, let's continue.
>>
>> Actually, it would probably be better if libvirtd errored out on
>> relative path names (relative to what? libvirtd runs in '/', and has no
>> idea what directory virsh was running in), and therefore virsh should be
>> nice and convert names to absolute before handing them to libvirtd.
> 
> Ok. I guess an error for relative paths would be fine to avoid
> unexpected paths. All embedded console I know support relative path
> (e.g.: python, irb, rails console, etc).

virsh would still support relative paths, it's just the underlying
libvirtd should require absolute (in other words, the UI should do the
normalizing, so that the RPC is unambiguous; right now the RPC is doing
the normalization, but to the wrong directory because it doesn't know
what the wording directory of the UI is).  This is a bug, but easy
enough to fix, and in the meantime, easy enough for you to work around
(use absolute instead of relative, until libvirt 1.0.4 is out).

> 
> Here is where we are in the workflow (step C) for what we are talking about:
> 
> Step 1: create external snapshot for all VM disks (includes VM state).
> Step 2: do the backups manually while the VM is still running (original disks and memory state).

During this step, the qcow2 files created in step 1 are getting larger
proportional to the amount of changes done in the guest; obviously, the
faster you can complete it, the smaller the deltas will be, and the
faster your later merge steps will be.  Since later merge steps have to
be done while the guest is halted, it's good to keep small size in mind.
 More on this thought below...

> Step 3: save and halt the vm state once backups are finished.

By 'halt the vm state', do you mean power it down, so that you would be
doing a fresh boot (aka 'virsh shutdown dom', do your work including
'virsh edit dom', 'virsh start dom')?  Or do you mean 'take yet another
snapshot', so that you stop qemu, manipulate things to point to the
right files, then start a new qemu but pickup up at the same point where
the running guest left off (aka 'virsh save dom file', do your work
including 'virsh save-file-edit file', 'virsh restore file')?

My advice: Don't use managedsave.  At this point, it just adds more
confusion, and you are better off directly using 'virsh save'
(managedsave is just a special case of 'virsh save', where libvirt picks
the file name on your behalf, and where 'virsh start' is smart enough to
behave like 'virsh restore file' on that managed name - but that extra
magic in 'virsh start' makes life that much harder for you to modify
what the guest will start with).

> Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file.

This step is done with raw qemu-img commands at the moment, and takes
time proportional to the size of the qcow2 data.

> Step 5: start the VM.

Based on how you stopped the vm in step 3, this is either via 'virsh
start' (assuming you did 'virsh shutdown', or via 'virsh restore'
(assuming you did 'virsh save'; with the special case that if you did
'virsh managedsave, 'virsh start' behaves like 'virsh restore').

...As mentioned above, the time taken in step 2 can affect how big the
delta is, and therefore how long step 4 lasts (while the guest is
offline).  If your original disk is huge, and copying it to your backup
takes a long amount of time, it may pay to do an iterative approach:

start with raw image:
  raw.img

create the external snapshot at the point you care about
  raw.img <- snap1.qcow2

transfer raw.img and vmstate file to backup storage, taking as long as
needed (gigabytes of data, so on the order of minutes, during which the
qcow2 files can build up to megabytes in size)
  raw.img <- snap1.qcow2

create another external snapshot, but this time with --disk-only
--no-metadata (we don't plan on reverting to this point in time)
  raw.img <- snap1.qcow2 <- snap2.qcow2

use 'virsh blockcommit dom vda --base /path/to/raw --top /path/to/snap1
--wait --verbose'; this takes time for megabytes of storage, but not
gigabytes, so it is faster than the time to copy raw.img, which means
snap2.qcow2 will hold less delta data than snap1.qcow2
  raw.img <- snap2.qcow2

now stop the guest, commit snap2.qcow2 into raw.img, and restart the guest

By doing an iteration, you've reduced the size of the file that has to
be committed while the guest is offline; and may be able to achieve a
noticeable reduction in guest downtime.

> <For whatever reason, I have to restore the backup from step 2>
> Step A: shutdown the running VM and move it out the way.

Here, 'virsh destroy' is fine, if you don't care about maintaining
parallel branches of execution past the snapshot point.  If you DO plan
on toggling between two parallel branches from a common snapshot, be
sure to take another snapshot at this point in time.

> Step B: restore the backing files and state file from the archives of step 2.
> Step C: restore the VM.

Here, you need to use 'virsh restore' on the file that holds the vm
state from the point of the snapshot.

> 
> So, yes: this is the memory state from the point at which the snapshot
> was taken but I clearly expect it to point to the backing file only.

You can double-check what it points to with 'virsh save-image-dumpxml',
to make sure.

> 
>> Yeah, again a known limitation.  Once you change state behind libvirt's
>> back (because libvirt doesn't yet have snapshot-revert wired up to do
>> things properly), you generally have to 'virsh snapshot-delete
>> --metadata VM snap1' to tell libvirt to forget the snapshot existed, but
>> without trying to delete any files, since you did the file deletion
>> manually.
> 
> Good, this is what I was missing.

In fact, if you KNOW you don't care about libvirt tracking snapshots,
you can do 'virsh snapshot-create[-as] --no-metadata dom ...' in the
first place, so that you get the side effects of external file creation
without any of the (soon-to-be-useless) metadata in the first place.

>>
>> Try 'virsh managedsave-remove VM' to get the broken managedsave image
>> out of the way.
> 
> Well, no. I would expect to come back to the exact same environment as
> after the backup. To do so, I expect to be able to do steps 3, 4 and 5
> cleanly.

Again, my recommendation is to NOT use managedsave.  It changes what
'virsh start' will do: if there is a managedsave image present, it takes
precedence over anything you do in 'virsh edit', unless you use 'virsh
start --force-boot' to intentionally discard the managedsave image.  On
the other hand, since the managedsave image is the only record of the
running vm state, you don't want it discarded, which means your attempt
to use 'virsh edit' are useless, and you are forced to use 'virsh
save-image-edit' on a file that should be internal to libvirt.

> 
> Step 3: save and halt the vm state once backups are finished.
> Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file.
> Step 5: start the VM.

Sounds like we discussed this up above - you can either do this offline
(virsh shutdown, edit, virsh start) or on a running image (virsh save,
edit, virsh restore).

>>>   # virsh save-image-edit /var/lib/libvirt/qemu/save/VM.save
>>>   <virsh edit VM to come back to vda -> VM.raw>
>>>   error: operation failed : new xml too large to fit in file
>>>   #
>>
>> Aha - so you ARE brave, and DID try to edit the managedsave file.  I'm
>> surprised that you really hit a case where your edits pushed the XML
>> over a 4096-byte boundary.  Can you come up with a way to (temporarily)
>> use shorter names, such as having /VM-snap1.img be a symlink to the real
>> file, just long enough for you to get the domain booted again?
> 
> Excellent. I don't know why I didn't think about trying that. Tested and
> the symlink trick works fine. I had to change the disk format in the
> memory header, of course.
> 
> BTW, I guess I can prevent that by giving absolute path for the
> snapshot longer than the original disk path.

Yeah, being more careful about the saved image that you create in the
first place will make it less likely that changing the save image adds
enough content to push XML over a 4096-byte boundary.

> 
>>                                                                 Also, I
>> hope that you did your experimentation on a throwaway VM, and not on a
>> production one, in case you did manage to fubar things to the point of
>> data corruption by mismatching disk state vs. memory state.
> 
> I did everything in a testing environment where breaking guests or
> hypervisor does not matter.

Always good advice, when trying something new and potentially dangerous :)

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
libvirt-users mailing list
libvirt-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvirt-users