Re: RFC: mirrored live block migration in libvirt 0.9.11

Eric Blake <eblake@xxxxxxxxxx> · Wed, 14 Mar 2012 03:38:26 -0600

On 03/14/2012 02:16 AM, Paolo Bonzini wrote:
>>  <domainsnapshot>
>>    <disks>
>>      <disk name='/src/base.img' snapshot='external'>
>>        <source file='/src/snap.img'/>
>>        <mirror file='/dest/snap.img'/>
>>      </disk>
>>    </disks>
>>  </domainsnapshot>
>> would create a new libvirt snapshot object with /src/snap.img as the
>> read-write new image, and /dest/snap.img as the new write-only mirror.
>> On success, this rewrites the domain's live XML to point to
>> /src/snap.img as its current file.
> 
> This is an awfully low-level API; you're designing for oVirt rather than
> for everything else.  The problem here is twofold:
> 
> 1) you're defining a snapshot that cannot be started without losing the
> mirrors.

Yes, I don't see any way around that - this proposal will only let you
create a mirror with a live qemu session; the moment you start a new
qemu session, you have lost the mirroring.  I agree that when mirroring
is more mature (so that the qemu command line can start a domain with
mirroring from the get-go), libvirt will probably need nicer API to
expose that.  For that matter, someday libvirt needs to expose the
entire backing chain in the <domain> XML, including any mirroring,
rather than just leaving mirroring to this one-off solution for oVirt in
<domainsnapshot>, but that's a bigger project.

> 
> 2) in case the snapshotting is aborted early for any reason, oVirt has
> to do a rebase operation manually.  This is currently O(size-of-disk),
> not O(changes-in-the-last-image), so it wastes both disk space and time.

I don't follow the argument for this.  It may be a valid complaint, but
I'm not yet seeing why you think oVirt has to do a rebase operation
manually, or why that operation will cost O(disk) rather than
O(changes).  If I have:

base <- snap1

and request a snapshot that mirrors to snap2 in two locations, but abort
half-way through, then I can just call
virDomainSnapshotDelete(VIR_DOMAIN_SNAPSHOT_DELETE_METADATA) which makes
libvirt forget that it attempted to take a snapshot, but without losing
the XML that says that the disk is now based on snap2.  That means
restarting the domain would use:

base <- snap1 <- snap2

as its backing file, and virDomainBlockRebase can be used to initiate a
'block_stream' to collapse it back to a shorter backing chain.

> 
> If it works, I cannot really say "don't do it", but I think the oVirt
> mirrored snapshots idea is a dead-end and a workaround for lack of block
> device streaming (which is now supported).  You could have a simpler,
> high-level API based on streaming rather than snapshotting.  So, if you
> have /src/disk.img as your image, you would have a new API:
> 
>   virDomainBlockCopy(dom, "disk",
>                      "/dst/disk.img", "/src/base.img",
>                      bandwidth, flags)

Yes, a new API would ultimately be nicer, and allow us to expose more
features of the new qemu 'transaction' command.  I will probably
eventually add something along those lines, but it goes against my
stated goal of implementing a first-cut working solution for oVirt that
uses just the 0.9.10 API.  In other words, for backporting purposes, I'd
like a solution that doesn't require a .so bump, even if it is ugly;
while saving a new API until we have a bit more experience in all the
various cases that users want to make sure the new API can more
conveniently cover all of those cases.

> 
>> Finally, virDomainSnapshotDelete will learn a new flag,
>> VIR_DOMAIN_SNAPSHOT_DELETE_REOPEN_MIRROR, which says that the libvirt
>> snapshot object will be deleted, but only after first calling the qemu
>> 'drive-reopen' monitor command for all disks that had a <mirror> in the
>> associated snapshot object.  That is, for the above example, this would
>> reopen the disk from it's current read-write of /src/snap.img over to
>> the second storage domain's /dest/snap.img with it's accompanying
>> mirrored backing chain.  On success, this rewrites the domain's live XML
>> to point to the just-opened mirror location.  This flag will fail if the
>> libvirt snapshot being deleted is not the current image, or if the
>> snapshot being deleted does not have any mirrored disks.
> 
> I think you also need VIR_DOMAIN_SNAPSHOT_DELETE_REMOVE_MIRROR, to be
> used in case of abort so that the domain can actually be started.  Or it
> could be an event MIRROR_DROPPED or something like that.

Good call.  VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY says to drop
libvirt's notion of the snapshot object, but it won't stop qemu from
mirroring; so an additional flag that tells libvirt to 'drive-reopen'
back to the source to discard any mirroring would be handy.

I'm not sure whether an event for MIRROR_DROPPED is needed; I guess the
only time a mirror is dropped without an explicit action that causes a
'drive-reopen' is where you try to restart a qemu process.  But since
oVirt is using transient domains, that means that destroying a running
qemu process and then starting a new transient domain on the same image
loses all snapshot information anyway, so oVirt should already be aware
that it has lost the mirroring information as part of tearing down and
rebuilding a transient domain.  I'll keep the idea of an event in mind,
but I'm not sure I see any place where it would be useful.  But it does
point out that I should probably either prevent the use of a
<domainsnapshot> with <mirror> on persistent domains, or at least
prevent the use of 'virsh start' on such a persistent domain until the
snapshot has been deleted.

-- 
Eric Blake   eblake@xxxxxxxxxx    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment:
signature.asc

Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list