On 04/25/2018 03:19 AM, Nikolay Shirokovskiy wrote: > > > On 24.04.2018 23:02, John Snow wrote: >> >> >> On 04/23/2018 06:38 AM, Nikolay Shirokovskiy wrote: >>> >>> >>> On 21.04.2018 00:26, Eric Blake wrote: >>>> On 04/20/2018 01:24 PM, John Snow wrote: >>>> >>>>>>> Why is option 3 unworkable, exactly?: >>>>>>> >>>>>>> (3) Checkpoints exist as structures only with libvirt. They are saved >>>>>>> and remembered in the XML entirely. >>>>>>> >>>>>>> Or put another way: >>>>>>> >>>>>>> Can you explain to me why it's important for libvirt to be able to >>>>>>> reconstruct checkpoint information from a qcow2 file? >>>>>>> >>>>>> >>>>>> In short it take extra effort for metadata to be consistent when >>>>>> libvirtd crashes occurs. See for more detailed explanation >>>>>> in [1] starting from words "Yes it is possible". >>>>>> >>>>>> [1] https://www.redhat.com/archives/libvir-list/2018-April/msg01001.html >>>> >>>> I'd argue the converse. Libvirt already knows how to do atomic updates >>>> of XML files that it tracks. If libvirtd crashes/restarts in the middle >>>> of an API call, you already have indeterminate results of whether the >>>> API worked or failed; once libvirtd is restarted, you'll have to >>>> probably retry the command. For all other cases, the API call >>>> completes, and either no XML changes were made (the command failed and >>>> reports the failure properly), or all XML changes were made (the command >>>> created the appropriate changes to track the new checkpoint, including >>>> whatever bitmap names have to be recorded to map the relation between >>>> checkpoints and bitmaps). >>> >>> We can fail to save XML... Consider we have B1, B2 and create B3 bitmap >>> in the process of creating checkpoint C3. Next qemu creates snapshot >>> and bitmap successfully then libvirt fail to update XML and after some >>> time libvirt restarts (not even crashes). Now libvirt nows of B1 and B2 but not B3. >>> What can be the consequences? For example if we ask bitmap from C2 we >>> miss all changes from C3 as we don't know of B3. This will lead to corrupted >>> backups. >>> >>> This can be fixed: >>> >>> - in qemu. If bitmaps have child/parent realtionship then on libvirt restart >>> we can recover (we ask qemu for bitmaps, discover B3 and then discover >>> B3 is child of B2). This is how basically implementation with naming >>> scheme works. Well on this way we don't need special metadata in >>> libvirt (besides may be domain xml attached to checkpoiint etc) >>> >>> - in libvirt. If we save XML before creating a snapshot with checkpoint. >>> This fixes the issue with successful operation but saving XML failure. >>> But now we have another issue :) We can save XML successfully but then operation >>> itself can fail and we fail to revert XML back. Well we can recover >>> even without child/parent metadata in qemu in this case. Just ask >>> qemu for bitmaps on libvirt restart and if bitmap is missing kick >>> it out as it is a case described above (successful saving XML then >>> unsuccessfull qemu operation) >>> >> >> This option seems perfectly workable to me... >> >>> So it is possible to track bitmaps in libvirt. We just need to be extra carefull >>> not to produce invalid backups. >>> >>>> >>>> Consider the case of internal snapshots. Already, we have the case >>>> where qemu itself does not track enough useful metadata about internal >>>> snapshots (right now, just a name and timestamp of creation); so libvirt >>>> additionally tracks further information in <domainsnapshot>: the name, >>>> timestamp, relationship to any previous snapshot (libvirt can then >>>> reconstruct a tree relationship between all snapshots; where a parent >>>> can have more than one child if you roll back to a snapshot and then >>>> execute the guest differently), the set of disks participating in the >>>> snapshot, and the <domain> description at the time of the snapshot (if >>>> you hotplug devices, or even the fact that creating external snapshots >>>> changes which file is the active qcow2 in a backing chain, you'll need >>>> to know how to roll back to the prior domain state as part of >>>> reverting). This is approximately the same set of information that a >>>> <domaincheckpoint> will need to track. >>> >>> I would differentiate checkpoints and backups. For example in case >>> of push backups we can store additional metadata in <domainbackup> >>> so later we can revert back to previous state. But checkpoints >>> (bitmaps technically) are only to make incremental backups(restores?). >>> We can attach extra metadata to checkpoints but it looks accidental just because >>> bitmaps and backups relate to some same point in time. To me a backup (push) >>> can carry all the metadata and as to checkpoints a backup can have >>> associated checkpoint or not. For example if we choose to always >>> make full backups we don't need checkpoints at all (at least if we are >>> not going to use them for restore). >>> >> >> Well ... if we create checkpoints alongside full backups, then you have >> points to reference to create future incremental backups. You don't need >> checkpoints if you *NEVER* use an incremental backup. If we want the >> feature enabled, so to speak, you likely need to be making checkpoints >> alongside full backups. >> >> I'd say the cases in which we don't want them -- once the feature is >> enabled -- are hard to find. >> >>>> >>>> I'm slightly tempted to just overload <domainsnapshot> to track three >>>> modes instead of two (internal, external, and now checkpoint); but think >>>> that will probably be a bit too confusing, so more likely I will create >>>> <domaincheckpoint> as a new object, but copy a lot of coding paradigms >>>> from <domainsnapshot>. >>> >>> I wonder if you are going to use tree or list structure for backups. >>> To me it is much easier to think of backups just as sequence of states >>> in time. For example consider Grandfather-Father-Son scheme of Acronis backups [1]. >>> Typical backup can look like: >>> >>> F - I - I - I - I - D - I - I - I - I - D >>> >>> Where F is full monthly backup, I incremental daily backup and D is >>> diferrential weekly backup (no backups on Sunday and Saturday). >>> This is representation from time POV. From backup dependencies POV it look likes next: >>> >>> F - I - I - I - I D - I - I - I - I D >>> \-------------------| | >>> \--------------------------------------| >>> >>> or more common representation: >>> >>> F - I - I - I - I >>> \- D - I - I - I - I >>> \- D - I - I - I - I >>> >>> To me using tree structure in snapshots is aproppriate because each branching >>> point is some semantic state ("basic OS installed") and branches are different >>> trials from that point. In backup case I guess we don't want branching on recovery >>> to some backup, we just want to keep selected backup scheme going. So for example >>> if we recover on Wednesday to previous week's Friday then later on Wednesday we >>> will have regular Wednesday backup as if we have not been recovered. This makes >>> things simple for client or he will drawn in dependencies (especially after >>> a couple of recoverings). >>> >> >> But your representation is itself a tree -- is this a good argument >> against hierarchical information ... ? >> >> If you don't utilize the hierarchy, the degenerate form is indeed just a >> list: >> >> F - I - I - I - I - I - I - I - I - I ... >> >> everything has just one successor. >> >> I think Eric just feels he can get good code re-use out of the >> <domainsnapshot> element -- since each <snapshot> element itself >> references a parent ID; there's no real "cost" to tracking a tree >> instead of a list. >> >> There's nothing stopping you from adding three checkpoints that have the >> same parent, so to speak. >> >> I think this is just something that might wind up happening "for free" >> due to the nature of how libvirt stores relational data at all. > > I mean we have to store tree structure for backups of course. I suggest > > - not to expose tree structure thru API in the first place. For example > we can have API like > > - virDomainBackupList(time_t from, time_t to, > virDomainBackupPtr **backups, > unsigned int flags) > > to list backups in some period of time with flags like > -'only full backups', > -'include parent backups if they don't fit into interval' > -'include children backups if they don't fit into interval' > > - virDomainBackupListChildren(virDomainBackupPtr parent, > virDomainBackupPtr **backups, > unsigned int flags) > > to list backup childrens > > - in case of restore don't branch from restored state instead just continue > to backup as if changes brought by restore are produced by guest > > So API has means to explore tree structure eventually (virDomainBackupListChildren) > but I suggest to think of and provide means to work with backups as a sequence > in time not tree in the first place. > Oh, sure. That might be reasonable, but I'll probably defer to Eric's opinion here. The XML storage can be tree-based (as a natural occurrence) but I don't know if we need to make the API tree-based, right. I don't have a really strong stance here -- I'd say whatever makes the most sense with the implementation that best facilitates code re-use in libvirt. --js >> >>> Of course internally we need to track backup dependencies in order to >>> properly delete backups or recover from them. >>> >>> [1] https://www.acronis.com/en-us/support/documentation/AcronisBackup_11.5/index.html#760.html >>>> >>>> So, from that point of view, libvirt tracking the relationship between >>>> qcow2 bitmaps in order to form checkpoint information can be done ALL >>>> with libvirt, and without NEEDING the qcow2 file to track any relations >>>> between bitmaps. BUT, libvirt's job can probably be made easier if >>>> qcow2 would, at the least, allow bitmaps to track their parent, and/or >>>> provide APIs to easily merge a parent..intermediate..child chain of >>>> related bitmaps to be merged into a single bitmap, for easy runtime >>>> creation of the temporary bitmap used to express the delta between two >>>> checkpoints. >>>> >>>> >>> >>> [snip] >>> >>> Nikolay >>> -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list