On 07/15/2013 03:04 PM, Richard W.M. Jones wrote: > On Mon, Jul 15, 2013 at 05:57:12PM +0800, Fam Zheng wrote: >> Hi all, >> >> QEMU-KVM BZ 955734, and libvirt BZ 905125 are about feature "Read-only >> point-in-time throwaway snapshot". The development is ongoing on >> upstream, which implements the core functionality by QMP command >> drive-backup. I want to demonstrate the HMP/QMP commands here for image >> fleecing tasks (again) and make sure this interface looks ready and >> satisfying from Libvirt point of view. I'm wondering if we can still get something committed in time for the freeze for 1.1.1. At this point, we're close enough to the freeze, and with no patches submitted in libvirt and the qemu design still under discussion, that I'm worried about whether we are rushing things too much to take a new interface this late in a libvirt release cycle, or whether we should wait until after 1.1.1 before attempting to add things. On the other hand, if we can agree on a sane design now (or at least before rc2, if we miss rc1), then we can commit to that design for this libvirt release, and downstream distros can use libvirt 1.1.1 as a starting point for rebases without worrying about so-name compatibility, by signing up to the efforts of backporting actual implementation from future upstream qemu and libvirt releases. We've done the approach of an early commit to a new API in the past, even if I'm not necessarily the biggest fan of the approach. For example, we chose to add virDomainBlockRebase to libvirt 0.9.10 (commit 9f902a2, when qemu 1.0 was current) as a way to expose more functionality than what virDomainBlockPull supported, even though we didn't actually implement new functionality until libvirt 1.0.0 and qemu 1.3 (commit c1eb380). The libvirt API design was sound enough that I was able to drive the eventual qemu implementation without any problems, and where the implementation could be backported without so-name bump all the way to 0.9.10. I do want to emphasize that both image fleecing and point-in-time snapshots are features that people want. At the same time, today's qemu.git does not yet have all the patches in place, and we are past soft freeze for qemu 1.6, so there may be a bit of a debate on the qemu list on what aspects of the proposed patches to take, or even a decision that it is too controversial and will wait until qemu 1.7 before being in upstream qemu. Historically, we are reluctant to add implementations to upstream libvirt until the corresponding qemu feature is fully-baked upstream; and leave it to distro backporters to decide if the feature is important enough to backport onto whatever earlier version they base their distro on. At the same time, distro backporters have more flexibility with pulling changes that do not require a so-name bump, and I'm fairly confident that we need a new libvirt API to drive the features, so if we want to support a distro using libvirt 1.1.1, then we need to settle on the libvirt API now even if it remains unimplemented for another libvirt release. Also, in the past, I have posted proposed API for virDomainBlockCopy() [1], but left it unimplemented in upstream libvirt in case future qemu came up with more options that would need tweaking. At this point in time, now that qemu is talking both about adding point-in-time snapshots (block-backup) and image fleecing, I think the time is right to commit to an API for virDomainBlockCopy(). [1]https://www.redhat.com/archives/libvir-list/2012-April/msg00632.html >> >> We get cheap point-in-time snapshot, and export it through built in NBD >> server, by commands described below: >> >> 1. qemu-img create -f qcow2 -o backing_file=RUNNING-VM.img BACKUP.qcow2 >> >> (although the backing_file option is not honoured in the next step >> because we *override* backing file with an existing >> BlockDriverState, giving it here does no harm and also makes sure >> the created image is of right size.) Use of qemu-img while the file is also owned by a running qemu is dangerous, we'd need the equivalent of this command to be supported from within qemu, or else create the destination without naming a backing file and follow up with something like qemu-img rebase -u to plug in the metadata of what the eventual backing file name will be, all without ever opening the backing file externally. But that's low-level implementation, and shouldn't affect the design of a libvirt API. >> >> 2. (HMP) drive_add backing=ide0-hd0,file=BACKUP.qcow2,id=target0,if=none >> >> (where ide0-hd0 is the running BlockDriverState name for >> RUNNING-VM.img) Whether this is done with HMP, or a QMP command gets added in time, is also a low-level detail. >> >> 3. (QMP) drive-backup device=ide0-hd0 mode=drive sync=none target=target0 >> >> (NewImageMode 'drive' means target is looked up as a device id, sync >> mode 'none' means don't copy any data except copy-on-write the >> point in time snapshot data) >> >> 4. (QMP) nbd-server-add device=target0 >> >> When image fleecing done: >> >> 1. (QMP) block-job-complete device=ide0-hd0 >> >> 2. (HMP) drive_del target0 >> >> 3. rm BACKUP.qcow2 >> >> Note: HMP drive_add/drive_del has no counterpart in QMP now but a new >> command blockdev-add to do similar things is WIP, which can be an >> alternative in QMP flavor. The earlier design I mentioned for virDomainBlockCopy in 2012 would only work on only one disk at a time; a user could start multiple block jobs, but would have to coordinate them by hand. Paolo's reply to this thread suggested an interface that took a list of block devices, rather than one, and guarantees that the point in time semantic applies to all the devices at once. Unfortunately, the current libvirt block job semantics are tied to a single disk (virDomainBlockStats, virDomainBlockJobAbort), so if we want to manage multiple disks at a common point in time, it sounds more like we'd want to treat this as a generic domain job id rather than a libvirt block job (virDomainGetJobStats, virDomainAbortJob). On the other hand, virDomainAbortJob is hard-wired to a single background job at a time; but with image fleecing, we definitely want to support multiple clients fleecing from different points in time simultaneously, which would imply having a job id. Therefore, I'm worried that properly supporting this will involve the addition of multiple API; adding just a super-power virDomainBlockCopy() does not give us as much control as what I think we want. It's late for me, and I know DV wants to cut rc1, but I hope this sparks some conversations, and that we can decide on whether we need to pursue the idea of supporting API for image fleecing as part of libvirt 1.1.1, or whether we punt and state that there is just too much design work still in the state of flux. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list