On 04/20/2018 01:24 PM, John Snow wrote: >>> Why is option 3 unworkable, exactly?: >>> >>> (3) Checkpoints exist as structures only with libvirt. They are saved >>> and remembered in the XML entirely. >>> >>> Or put another way: >>> >>> Can you explain to me why it's important for libvirt to be able to >>> reconstruct checkpoint information from a qcow2 file? >>> >> >> In short it take extra effort for metadata to be consistent when >> libvirtd crashes occurs. See for more detailed explanation >> in [1] starting from words "Yes it is possible". >> >> [1] https://www.redhat.com/archives/libvir-list/2018-April/msg01001.html I'd argue the converse. Libvirt already knows how to do atomic updates of XML files that it tracks. If libvirtd crashes/restarts in the middle of an API call, you already have indeterminate results of whether the API worked or failed; once libvirtd is restarted, you'll have to probably retry the command. For all other cases, the API call completes, and either no XML changes were made (the command failed and reports the failure properly), or all XML changes were made (the command created the appropriate changes to track the new checkpoint, including whatever bitmap names have to be recorded to map the relation between checkpoints and bitmaps). Consider the case of internal snapshots. Already, we have the case where qemu itself does not track enough useful metadata about internal snapshots (right now, just a name and timestamp of creation); so libvirt additionally tracks further information in <domainsnapshot>: the name, timestamp, relationship to any previous snapshot (libvirt can then reconstruct a tree relationship between all snapshots; where a parent can have more than one child if you roll back to a snapshot and then execute the guest differently), the set of disks participating in the snapshot, and the <domain> description at the time of the snapshot (if you hotplug devices, or even the fact that creating external snapshots changes which file is the active qcow2 in a backing chain, you'll need to know how to roll back to the prior domain state as part of reverting). This is approximately the same set of information that a <domaincheckpoint> will need to track. I'm slightly tempted to just overload <domainsnapshot> to track three modes instead of two (internal, external, and now checkpoint); but think that will probably be a bit too confusing, so more likely I will create <domaincheckpoint> as a new object, but copy a lot of coding paradigms from <domainsnapshot>. So, from that point of view, libvirt tracking the relationship between qcow2 bitmaps in order to form checkpoint information can be done ALL with libvirt, and without NEEDING the qcow2 file to track any relations between bitmaps. BUT, libvirt's job can probably be made easier if qcow2 would, at the least, allow bitmaps to track their parent, and/or provide APIs to easily merge a parent..intermediate..child chain of related bitmaps to be merged into a single bitmap, for easy runtime creation of the temporary bitmap used to express the delta between two checkpoints. > > OK; I can't speak to the XML design (I'll leave that to Eric and other > libvirt engineers) but the data consistency issues make sense. And I'm still trying to figure out exactly what is needed, to capture everything needed to create checkpoints and take backups (both push and pull model). Reverting to data from an external backup may be a bit more manual, at least at first (after all, we STILL don't have decent libvirt support for rolling back to external snapshots, several years later). In other words, my focus right now is "how can we safely track checkpoints for capturing of point-in-time incremental backups with minimal guest downtime", rather than "given an incremental backup captured previously, how do we roll a guest back to that point in time". > > ATM I am concerned that by shifting the snapshots into bitmap names that > you still leave yourself open for data corruption if these bitmaps are > modified outside of libvirt -- these third party tools can't possibly > understand the schema that they were created under. > > (Though I suppose very simply that if a bitmap is missing you'd be able > to detect that in libvirt and signal an error, but it's not very nice.) Well, we also have to realize that third-party tools shouldn't really be mucking around with bitmaps they don't understand. If you are going to manipulate a qcow2 file that contains persistent bitmaps, you should not delete a bitmap you did not create; and if the bitmap is autoloaded, you must obey the rules and amend the bitmap for any guest-visible changes you make during your data edits. Just like a third-party tool shouldn't really be deleting internal snapshots it didn't create. I don't think we have to worry as much about being robust to what a third party tool would do behind our backs (after all, the point of the pull model backups is so that third-party tools can track the backup in the format THEY choose, after reading the dirty bitmap and data over NBD, rather than having to learn qcow2). > > I'll pick up discussion with Eric and Vladimir in the other portion of > this thread where we're discussing a checkpoints API and we'll pick this > up on QEMU list if need be. Yes, between this thread, and some IRC chats I've had with John in the meantime, it looks like we DO want some improvements on the qcow2 side of things on the qemu list. Other things that I need to capture from IRC: Right now, it sounds like the incremental backup model (whether push or pull) is heavily dependent on qcow2 files for persistent bitmaps. While libvirt can perform external snapshots by creating a qcow2 wrapper around any file type, and live commit can then merge that qcow2 file back into the original file, libvirt is already insistent that internal snapshots can only be taken if all disks are qcow2. So the same logic will apply to taking backups (whether the backup is incremental by starting from a checkpoint, or full over the complete disk contents). Also, how should checkpoints interact with external snapshots? Suppose I have: base <- snap1 and create a checkpoint at time T1 (which really means I create a bitmap titled B1 to track all changes that occur _after_ T1). Then later I create an external snapshot, so that now I have: base <- snap1 <- snap2 at that point, the bitmap B1 in snap1 is no longer being modified, because snap1 is read-only. But we STILL want to track changes since T1, which means we NEED a way in qemu to not only add snap2 as a new snapshot, but ALSO to create a new bitmap B2 in snap2, that tracks all changes (until the next checkpoint, of course). Whether B2 starts life empty (and libvirt just has to remember that it must merge snap1.B1 and snap2.B2 when constructing the delta), or whether B2 starts life as a clone of the final contents of snap1.B1, is something that we need to consider in qemu. And if there is more than one bitmap on snap1, do we need to bring all of those bitmaps forward into snap2, or just the one that was currently active? Similarly, if we later decide to live commit snap2 back into snap1, we'll want to merge the changes in snap2.B2 back into snap1.B1 (now that snap1 is once again active, it needs to track all changes that were merged in, and all future changes until the next snapshot). Which means we need to at least be thinking about cross-node snapshot merges, even if, from the libvirt perspective, checkpoints are more of a per-drive attribute rather than a per-node attribute. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list