Re: [PATCHv9 4/9] blockjob: support pivot operation on cancel

Peter Krempa <pkrempa@xxxxxxxxxx> · Fri, 26 Oct 2012 15:07:26 +0200

On 10/23/12 04:10, Eric Blake wrote:
This is the bare minimum to end a copy job (of course, until a
later patch adds the ability to start a copy job, this patch
doesn't do much in isolation; I've just split the patches to
ease the review).

This patch intentionally avoids SELinux, lock manager, and audit
actions.  Also, if libvirtd restarts at the exact moment that a
'block-job-complete' is in flight, the proposed proper way to
detect the outcome of that would be with a persistent bitmap and
some additional query commands for qemu 1.3 when libvirtd
restarts (RHEL 6.3 is out of luck).  This patch is enough to
test the common case of success when used correctly, while saving
the subtleties of proper cleanup for worst-case errors for later.

When a mirror job is started, cancelling the job safely reverts back
to the source disk, regardless of whether the destination is in
phase 1 (streaming, in which case the destination is worthless) or
phase 2 (mirroring, in which case the destination is synced up to
the source at the time of the cancel).  Our existing code does just
fine in either phase, other than some bookkeeping cleanup; this
implements live block copy.

Pivoting the job requires the qemu 1.3 'block-job-complete' (safe)
or the RHEL 6.3 '__com.redhat_drive-reopen' command (where failure
of the command is potentially catastrophic to the domain, since
it rips out the old disk before attempting to open the new one).

Ideas for future enhancements via new flags:

Since qemu 1.3 is safer than RHEL 6.3, it may be worth adding a
VIR_DOMAIN_REBASE_COPY_ATOMIC flag that fails up front if we
detect an older qemu with the risky pivot operation.

Interesting side note: while snapshot-create --disk-only creates a
copy of the disk at a point in time by moving the domain on to a
new file (the copy is the file now in the just-extended backing
chain), blockjob --abort of a copy job creates a copy of the disk
while keeping the domain on the original file.  There may be
potential improvements to the snapshot code to exploit block copy
over multiple disks all at one point in time.  And, if
'block-job-cancel' were made part of 'transaction', you could
copy multiple disks at the same point in time without pausing
the domain.  This also implies we may want to add a --quiesce flag
to virDomainBlockJobAbort, so that when breaking a mirror (whether
by cancel or pivot), the side of the mirror that we are abandoning
is at least in a stable state with regards to guest I/O.

* src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag.
(qemuDomainBlockPivot): New helper function.
(qemuDomainBlockJobImpl): Implement it.
---
  src/qemu/qemu_driver.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++-
  1 file changed, 107 insertions(+), 1 deletion(-)

ACK if the RHEL stuff will be pulled in, otherwise it will require a few 
changes.

Peter

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list