On 03/30/2010 08:14 PM, Matthias Bolte wrote: > 2010/3/30 Chris Lalancette <clalance@xxxxxxxxxx>: >> Hello, >> After our discussions about the snapshot API last week, I went ahead and implemented >> quite a bit of the API. I also went back to the ESX, Virtualbox, and QEMU API's to >> try and make sure our API's matched up. What's below is my revised API based on >> that survey. Following my revised API are notes that I took regarding how the >> libvirt API matches up to the various API's, and some questions about semantics that >> I had while doing the survey. More comments and questions are welcome. > >> /* Start the guest from the snapshot "snapshot" */ >> int virDomainCreateFromSnapshot(virDomainSnapshotPtr snapshot, >> unsigned int flags); > > Will it be enforced that the domain is shutdown in order to call this function? > > ESX doesn't have such a restriction. Not sure about other hypervisors. Heh, I was just going through that myself. No, it's not required to be shutdown in general; qemu supports both modes. I've updated the documentation for this call. <snip> >> * Note that if other snapshots would be discarded because of this >> * MERGE action, this operation will fail. If that is really what is intended, >> * use MERGE_FORCE. >> * >> * With a DISCARD flag, it deletes the snapshot. Note that if children snapshots >> * would be discarded because of this delete action, this operation will >> * fail. If this is really what is intended, use DISCARD_FORCE. >> * >> * MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive. >> * >> * Note that this operation can happen when the domain is running or shut >> * down, though this is hypervisor specific */ >> typedef enum { >> VIR_DOMAIN_SNAPSHOT_DELETE_MERGE, >> VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE, >> VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD, >> VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE, >> } virDomainSnapshotDelete; >> int virDomainSnapshotDelete(virDomainSnapshotPtr snapshot, >> unsigned int flags); >> >> int virDomainSnapshotFree(virDomainSnapshotPtr snapshot); >> >> NOTE: During snapshot creation, *none* of the fields are required. That is, >> you can call virDomainSnapshotCreateXML() with an XML of "<domainsnapshot/>". >> In this case, the individual driver will make up a <name> and <uuid> for you, > > Does <uuid> here refer to a snapshot UUID? As said before, there is no > easy way have a UUID per snapshot with ESX. Well, we could store > <uuid>:<name> in the name field on the ESX side, but that's not a > really good way to do it. Yeah, agreed, that was a leftover I forgot to edit out. See my reply to Jiri Denemark, but essentially I'm content to declare duplicate names unsupported/undefined, and not deal with UUID's at all. I've removed mention of UUID's from the documentation now. <snip> >> The virsh commands will be: >> virsh snapshot-create <dom> <xmlfile> >> virsh snapshot-list <dom> >> virsh snapshot-dumpxml <dom> <name> >> virsh start-with-snapshot <dom> <snapshotname> >> virsh snapshot-delete <dom> <snapshotname> [--merge|--mergeforce|--delete|--deleteforce] >> virsh snapshot-delete-all <dom> >> >> Possible issues: >> 1) I don't see a way to support "managed" save/restore and snapshotting with >> this API. I think we'll have to have a separate API for managed save/restore. > > What's "managed" save/restore and snapshotting? Oops, yeah, that's a personal note that I didn't really expound upon. One of the reasons that I originally started down the path of implementing snapshotting was to implement save/restore for guest during host shutdown and startup. Because of the way autostart works within libvirt, we can't have an external script (ala xendomains) do this; it needs to be handled inside the libvirt daemon itself, and our current save/restore API is not sufficient for this. That being said, after all of the discussions we have had about this snapshotting API, I don't think it will be appropriate to shoehorn this "managed" save/restore into this API, and we'll need a separate API for that. > >> 3) Do we need a snapshot UUID? Virtualbox allows you to have multiple snapshots >> with the same name, differentiated by UUID. Confusingly, they also have a >> "FindByName" method that returns the first depth-first search snapshot that matches >> a given name. For qemu, if you specify the same name twice it overwrites the previous >> one with the new one. I don't know what ESX does here. > > ESX 4.0 allows multiple snapshots with the same name. I think this is > because ESX 4.0 has an integer ID per snapshot. I'm not sure if ESX > 3.5 allows multiple snapshots with the same name, because the ID field > was added in ESX 4.0. I assume ESX 3.5 doesn't allow multiple > snapshots with the same name, but I have currently no ESX 3.5 at hand > to test. > > We could use this integer ID and convert it to UUID format, but you > won't be able to set the UUID, it'll be read-only and only available > on ESX 4.0 and above. Yeah, again, I'm happy to drop UUID and declare duplicate names unsupported unless there is a good use case. > >> Mapping of our interface to various hypervisors: >> +-------------------------------+-----------------+-------------------+------------------------------+ >> | Libvirt | Qemu | Virtualbox | ESX | >> +-------------------------------+-----------------+-------------------+------------------------------+ >> | virDomainSnapshotCreateXML | monitor command | takeSnapshot | CreateSnapshot_task | >> | | "savevm"; if | Snapshots can | takes a name, description, | >> | | snapshot name | be taken on | memory (true/false) and | >> | | is already in | powered off, | quiesce (true/false). | >> | | use, replaces | saved, running, | What does "memory" mean? | > > If memory is true, ESX snapshots the memory of the domain too, > otherwise only a disk snapshot is created. > > Creating a disk-only snapshot is nearly instant, while creating a > memory snapshot also requires a notable amount of time to write the > memory image to disk. Sorry, I misread the documentation yesterday. That's fairly clear. What's less clear to me is what happens when you take a disk-only snapshot, and then try to RevertToSnapshot from a running VM. What happens in that case? > >> | | the previous | or paused VMs. | Should we model "quiesce" | > > The vSphere API docs give a good description what the quiesce option does: > > "If TRUE and the virtual machine is powered on when the snapshot is > taken, VMware Tools is used to quiesce the file system in the virtual > machine. This assures that a disk snapshot represents a consistent > state of the guest file systems. If the virtual machine is powered off > or VMware Tools are not available, the quiesce flag is ignored." > > I assume "quiesce the file system" means to flush write caches and > stuff like that. > > This option is important if you want to create a disk-only snapshot of > a running domain. Exactly. I'm not sure this is going to be possible in general (and I guess it's not even really possible in ESX unless you install VMware Tools inside the guest). I'm inclined not to model it at the moment, although I could be convinced otherwise. > >> | | snapshot. Also | The snapshot is | Trees of snapshots are | >> | | qemu-img | always taken | supported. What happens | >> | | snapshot -c can | against the | on a duplicate name? What | >> | | be used to | current snapshot. | state(s) can a VM be in | >> | | create a | What happens on | when calling this? Does | >> | | disk-only | a duplicate | a VM get paused when this | >> | | snapshot. What | name? Trees of | is called? | > > In case of ESX the domain can be in any state when a snapshot is created. > > If the domain is running when you create a snapshot then the domain is > _not_ paused during the snapshot creation. > > I tested it and the memory snapshot represents the state at the time > the snapshot command was issued. OK, great. I'll update these notes about that. > >> | | happens if the | snapshots are | | >> | | VM is running | not currently | | >> | | when you do | supported. | | >> | | this? Trees of | Taking a snapshot | | >> | | snapshots seem | of a running VM | | >> | | to be supported | pauses the VM | | >> | | VM gets paused | before taking the | | >> | | while this is | snapshot. | | >> | | happening. What | | | >> | | states can the | | | >> | | VM be in? | | | >> +-------------------------------+-----------------+-------------------+------------------------------+ > > >> +-------------------------------+-----------------+-------------------+------------------------------+ >> | virDomainSnapshotDelete | monitor command | deleteSnapshot | RemoveSnapshot_Task | >> | | "delvm". What | deletes the | removes this snapshot and | >> | | happens if the | specified | deletes any associated | >> | | snapshot is in | snapshot. Takes | storage. Operates on a | >> | | use? What | an ID. The VM | VirtualMachineSnapshot | >> | | states can the | must be off. | object. What states can | >> | | VM be in? Also | Differences to | the VM be in? What | >> | | qemu-img | children | happens if this snapshot | >> | | snapshot -d | snapshots will be | is in-use? What happens | >> | | <name> <file> | merged with the | to parents and children? | >> | | command can be | children to keep | | >> | | used. What | children valid. | | >> | | happens if the | Parent for this | | >> | | disk is in-use? | snapshot will | | >> | | What happens to | become parent of | | >> | | parents and | any children | | >> | | children? | snapshots. | | >> | | How do we | | | >> | | handle merges? | | | >> +-------------------------------+-----------------+-------------------+------------------------------+ > > The domain can be in any state when deleting a snapshot, even if you > delete the current snapshot. VMware has some documentation about how a > snapshot is merged into its parent: > > http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002836 > > And some more general docs about snapshots: > > http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180 > > Regarding what get's merged and where, I should define the terms I'm > using first. > > A <--1-- B <--2-- C <--3-- current > + <--4-- D > > I intentionally draw the arrows directed from child to parent. > > A, B, C, D are what I call a snapshot, a point in "time" I can switch > to. The disk differences between these points are stored in COW sparse > images, here shown as 1, 2, 3, 4. The current state of the domain is > denoted by the "current" item. > > Each snapshot is associated with a disk image: A is associated with > the base image, B with sparse image 1, C with 2 and so on. A special > case is sparse image 3, it's not associated with a snapshot, but with > the current state. Also each snapshot can be associated with a memory > image (not shown here). > > The current snapshot in this case is C. If the domain writes changes > to disk, these changes get stored in sparse image 3. If you switch to > another snapshot from here then the changes in 3 are lost, because you > cannot go back to a point where you could access the changes in 3 > again. > > Now lets delete B. In this case the memory images associated with B is > just discarded and 1 and 2 are merged into 5. That's what I was > referring to when I said ESX merges snapshots into the parent. > > A <------5------- C <--3-- current > + <--4-- D > > But this only happens for snapshots like B, that have a parent and a > child (C is such a snapshot too, even if its child isn't an actual > snapshot). If you delete D in this example, then the changes in sparse > image 4 are discarded, because there is no place where they could be > merged. Merging 4 in the base image would alter A, merging 4 and 5 > would alter C. > > Now as I think of this in detail, it seems that the term "merging into > the parent" is wrong. > > In the next example we have snapshot E with parent B. > > A <--1-- B <--2-- C <--3-- current > + <--6-- E > > Now what's going to happen if we delete B? In order to preserve C and > E, the changes in 1 need to be merged into 2 and 6, this results in 1 > + 2 = 5 and 1 + 6 = 7. Or rephrased: B is merged in toits children C > and E. > > A <------5------- C <--3-- current > + <------7------- E > > So, virDomainSnapshotDelete's semantic for VirtualBox and ESX seems to > be the same. I just used the wrong words to describe it at first. > Sorry for that. OK, that's very interesting to know. So VirtualBox and ESX seem to do the same thing here. This is the last thing I have to do testing on with qemu to get it's semantic; I'll get to that today, and then we can look again at the semantics of the flags to virDomainSnapshotDelete. -- Chris Lalancette -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list