Re: [PATCHv5 10/23] blockjob: support pivot operation on cancel

Jiri Denemark <jdenemar@xxxxxxxxxx> · Thu, 19 Apr 2012 22:17:14 +0200



On Mon, Apr 16, 2012 at 23:06:01 -0600, Eric Blake wrote:
> This is the bare minimum to end a copy job (of course, until a
> later patch adds the ability to start a copy job, this patch
> doesn't do much in isolation; I've just split the patches to
> ease the review).
> 
> This patch intentionally avoids SELinux, lock manager, and audit
> actions.  Also, if libvirtd restarts at the exact moment that a
> 'drive-reopen' is in flight, the only proper way to detect the
> outcome of that 'drive-reopen' would be to first pass in a witness
> fd with 'getfd', then at libvirtd restart, probe whether that file
> is still empty.  This patch is enough to test the common case of
> success when used correctly, while saving the subtleties of proper
> cleanup for worst-case errors for later.
> 
> When a mirror job is started, cancelling the job safely reverts back
> to the source disk, regardless of whether the destination is in
> phase 1 (streaming, in which case the destination is worthless) or
> phase 2 (mirroring, in which case the destination is synced up to
> the source at the time of the cancel).  Our existing code does just
> fine in either phase, other than some bookkeeping cleanup.
> 
> Pivoting the job requires the use of the new 'drive-reopen' command.
> Here, failure of the command is potentially catastrophic to the
> domain, since the initial qemu implementation rips out the old disk
> before attempting to open the new one; qemu will attempt a recovery
> path of retrying the reopen on the original source, but if that also
> fails, the domain is hosed, with nothing libvirt can do about it.
> If qemu 1.2 ever adds 'drive-reopen' inside 'transaction', then the
> problem will no longer exist (a transaction promises not to close
> the old file until after the new file is proven to work), at which
> point we would add a VIR_DOMAIN_REBASE_COPY_ATOMIC that fails up
> front if we detect an older qemu with the risky drive-reopen.
> 
> Interesting side note: while snapshot-create --disk-only creates a
> copy of the disk at a point in time by moving the domain on to a
> new file (the copy is the file now in the just-extended backing
> chain), blockjob --abort of a copy job creates a copy of the disk
> while keeping the domain on the original file.  There may be
> potential improvements to the snapshot code to exploit block copy
> over multiple disks all at one point in time.  And, if
> 'block-job-cancel' were made part of 'transaction', you could
> copy multiple disks at the same point in time without pausing
> the domain.  This also implies we may want to add a --quiesce flag
> to the pivot operation, so that when breaking a mirror, the side
> of the mirror that we are abandoning is at least in a stable state
> with regards to guest I/O.
> 
> * src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag.
> (qemuDomainBlockPivot): New helper function.
> (qemuDomainBlockJobImpl): Implement it.
> ---
> 
> was 11/18 in v4
> v5: no real change, improve commit message
> 
>  src/qemu/qemu_driver.c |  106 +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 105 insertions(+), 1 deletions(-)

OK

Jirka

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list