On 01/24/2012 01:21 PM, Michal Privoznik wrote: > With this flag, virDomainSnapshotCreate will use fs-freeze and > fs-thaw guest agent commands to quiesce guest's disks. > --- > include/libvirt/libvirt.h.in | 4 ++ > src/libvirt.c | 6 +++ > src/qemu/qemu_driver.c | 87 ++++++++++++++++++++++++++++++++++++++---- > 3 files changed, 89 insertions(+), 8 deletions(-) Late review, but I just noticed this since I'm now touching the same area of code: > +++ b/src/libvirt.c > @@ -16602,6 +16602,12 @@ virDomainSnapshotGetConnect(virDomainSnapshotPtr snapshot) > * inconsistent (as if power had been pulled), and specifying this > * with the VIR_DOMAIN_SNAPSHOT_CREATE_HALT flag risks data loss. > * > + * If @flags includes VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE, then the > + * libvirt will attempt to use guest agent to freeze and thaw all > + * file systems in use within domain OS. However, if the guest agent > + * is not present, an error is thrown. Moreover, this flag requires > + * VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY to be passed as well. [note to self - if I ever get disk-only snapshots of offline guests to work by using qemu-img, then quiesce can be ignored, since an offline guest is by definition quiesced] > +static int > +qemuDomainSnapshotFSThaw(struct qemud_driver *driver, > + virDomainObjPtr vm) { > + qemuDomainObjPrivatePtr priv = vm->privateData; > + int thawed; > + > + if (priv->agentError) { > + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("QEMU guest agent is not " > + "available due to an error")); > + return -1; > + } > + if (!priv->agent) { > + qemuReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", > + _("QEMU guest agent is not configured")); > + return -1; > + } > + > + qemuDomainObjEnterAgent(driver, vm); > + thawed = qemuAgentFSThaw(priv->agent); Does qemuDomainObjEnterAgent and/or the common code that actually tries to pass a message over the agent socket ensure that the guest is still running? If the guest is online, but paused, then we know for a fact that the guest agent will not respond in a timely manner. Rather than make all callers check for this situation, I'm thinking that it would make sense to check that the vm is running prior to issuing a guest-agent command, and to issue an early error, rather than waiting for a timeout when we finally realize the guest never responded. This is particularly important for the quiesce command. Consider what happens if we don't check for paused first: - guest is paused - we issue the quiesce command which gets buffered up, but it times out without being delivered - we fail the command - the guest is resumed, and now consumes the data that was pending on the monitor - the guest goes into quiesce, but our command is no longer around to thaw the guest. I think we need to ensure that any failure path where we issued a quiesce command, _even if the quiesce command failed due to a timeout_, we _also_ issue a thaw command. If the quiesce command failed, we also expect the thaw command to fail, but if the failure was due to the guest being paused, now when the guest wakes up, it will have two pending commands on the bus, and will temporarily quiesce but then thaw right back. Of course, a paused guest isn't the only reason for a timeout, so we need to make sure of paired calls even if we also teach qemDomainObjEnterAgent to make both freeze and thaw be instant fails on a paused guest. > @@ -9773,6 +9823,12 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, > > > if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { > + if ((flags & VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE) && > + (qemuDomainSnapshotFSFreeze(driver, vm) < 0)) { > + /* helper reported the error */ > + goto endjob; > + } Note that if the guest is paused instead of running, and the QUIESCE flag is present, we fail to quiesce the disk. That means the snapshot we took is not stable. Okay, we only documented the QUIESCE flag as a best-effort (and anything that relies on the guest agent is inherently best-effort, as we can't trust the guest), but I think it makes more sense to _always_ attempt the FSFreeze on an active guest, and abort the snapshot if quiesce fails (including but not limited to the case where the guest is paused), rather than only attempting the quiesce on a running guest. > + > /* In qemu, snapshot_blkdev on a single disk will pause cpus, > * but this confuses libvirt since notifications are not given > * when qemu resumes. And for multiple disks, libvirt must > @@ -9840,13 +9896,20 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, > } > > cleanup: > - if (resume && virDomainObjIsActive(vm) && > - qemuProcessStartCPUs(driver, vm, conn, > - VIR_DOMAIN_RUNNING_UNPAUSED, > - QEMU_ASYNC_JOB_NONE) < 0 && > - virGetLastError() == NULL) { > - qemuReportError(VIR_ERR_OPERATION_FAILED, "%s", > - _("resuming after snapshot failed")); > + if (resume && virDomainObjIsActive(vm)) { > + if (qemuProcessStartCPUs(driver, vm, conn, > + VIR_DOMAIN_RUNNING_UNPAUSED, > + QEMU_ASYNC_JOB_NONE) < 0 && > + virGetLastError() == NULL) { > + qemuReportError(VIR_ERR_OPERATION_FAILED, "%s", > + _("resuming after snapshot failed")); > + goto endjob; > + } > + > + if ((flags & VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE) && > + qemuDomainSnapshotFSThaw(driver, vm) < 0) { > + /* helper reported the error */ > + } Again, we need go guarantee that if qemuDomainSnapshotFSFreeze was called, then this is also called; but that we only pay attention to an error here according to whether the freeze was reported as successful. I'm also worried about the atomic aspect - if we take the snapshot, and it is only the thaw that fails, we > + > + if ((flags & VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE) && > + !(flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)) { > + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", > + _("quiesce requires disk-only")); > + return NULL; > + } This restriction makes sense for now, but I think that it may be worth lifting someday if I figure out how to do a RAM snapshot with external disk snapshots. -- Eric Blake eblake@xxxxxxxxxx +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list