Re: [PATCH 2/8] backup: Document nuances between different state capture APIs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/26/2018 11:36 AM, Nir Soffer wrote:
On Wed, Jun 13, 2018 at 7:42 PM Eric Blake <eblake@xxxxxxxxxx> wrote:

Upcoming patches will add support for incremental backups via
a new API; but first, we need a landing page that gives an
overview of capturing various pieces of guest state, and which
APIs are best suited to which tasks.




Needs blank line between list items for easier reading of the source.

Sure.


I think we should describe checkpoints before backups, since the
expected flow is:

- user start backup
- system create checkpoint using virDomainCheckpointCreateXML
- system query amount of data pointed by the previous checkpoint
   bitmaps
- system create temporary storage for the backup
- system starts backup using virDomainBackupBegin

I actually think it will be more common to create checkpoints via virDomainBackupBegin(), and not virDomainCheckpointCreateXML (the latter exists because it is easy, and may have a use independent from incremental backups, but it is the former that makes chains of incremental backups reliable).

That is, your first backup will be a full backup (no checkpoint as its start) but will create a checkpoint at the same time; then your second backup is an incremental backup (use the checkpoint created at the first backup as the start) and also creates a checkpoint in anticipation of a third incremental backup.

You do have an interesting step in there - the ability to query how much data is pointed to in the delta between two checkpoints (that is, before I actually create a backup, can I pre-guess how much data it will end up copying). On the other hand, the size of the temporary storage for the backup is not related to the amount of data tracked in the bitmap. Expanding on the examples in my 1/8 reply to you:

At T3, we have:

S1: |AAAA----| <- S2: |---BBB--|
B1: |XXXX----|    B2: |---XXX--|
guest sees: |AAABBB--|

where by T4 we will have:

S1: |AAAA----| <- S2: |D--BBDD-|
B1: |XXXX----|    B2: |---XXX--|
                  B3: |X----XX-|
guest sees: |DAABBDD-|

Back at T3, using B2 as our dirty bitmap, there are two backup models we can pursue to get at the data tracked by that bitmap.

The first is push-model backup (blockdev-backup with "sync":"top" to the actual backup file) - qemu directly writes the |---BBB--| sequence into the destination file (based on the contents of B2), whether or not S2 is modified in the meantime; in this mode, qemu is smart enough to not bother copying clusters to the destination that were not in the bitmap. So the fact that B2 mentions 3 dirty clusters indeed proves to be the right size for the destination file.

The second is pull-model backup (blockdev-backup with "sync":"none" to a temporary file, coupled with a read-only NBD server on the temporary file that also exposes bitmap B2 via NBD_CMD_BLOCK_STATUS) - here, if qemu can guarantee that the client would read only dirty clusters, then it only has to write to the temporary file if the guest changes a cluster that was tracked in B2 (so at most the temporary file would contain |-----B--| if the NBD client finishes before T4); but more likely, qemu will play conservative and write to the temporary file for ANY changes whether or not they are to areas covered by B2 (in which case the temporary file could contain |A----B0-| for the three writes done by T4). Or put another way, if qemu can guarantee a nice client, then the size of B2 probably overestimates the size of the temporary file; but if qemu plays conservative by assuming the client will read even portions of the file that weren't dirty, then keeping those reads constant will require the temporary file to be as large as the guest is able to dirty data while the backup continues, which may be far larger than the size of B2. [And maybe this argues that we want a way for an NBD export to force EIO read errors for anything outside of the exported dirty bitmap, thus making the client play nice, so that the temporary file does not have to grow beyond the size of the bitmap - but that's a future feature request]

+    <h2><a id="examples">Examples</a></h2>
+    <p>The following two sequences both capture the disk state of a
+      running guest, then complete with the guest running on its
+      original disk image; but with a difference that an unexpected
+      interruption during the first mode leaves a temporary wrapper
+      file that must be accounted for, while interruption of the
+      second mode has no impact to the guest.</p>


This is not clear, I read this several times and I'm not sure what do
you mean here.

I'm trying to convey the point that with example 1...


Blank line between paragraphs


+    <p>1. Backup via temporary snapshot
+      <pre>
+virDomainFSFreeze()
+virDomainSnapshotCreateXML(VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)

...if you are interrupted here, your <domain> XML has changed to point to the snapshot file...

+virDomainFSThaw()
+third-party copy the backing file to backup storage # most time spent here

+virDomainBlockCommit(VIR_DOMAIN_BLOCK_COMMIT_ACTIVE) per disk
+wait for commit ready event per disk
+virDomainBlockJobAbort() per disk

...and it is not until here that your <domain> XML is back to its pre-backup state. If the backup is interrupted for any reason, you have to manually get things back to the pre-backup layout, whether or not you were able to salvage the backup data.

+      </pre></p>


I think we should mention virDomainFSFreeze and virDomainFSThaw before
this examples, in the same way we mention the other apis.

Can do.



+
+    <p>2. Direct backup
+      <pre>
+virDomainFSFreeze()
+virDomainBackupBegin()
+virDomainFSThaw()
+wait for push mode event, or pull data over NBD # most time spent here
+virDomainBackeupEnd()

In this example 2, using the new APIs, the <domain> XML is unchanged through the entire operation. If you interrupt things in the middle, you may have to scrap the backup data as not being viable, but you don't have to do any manual cleanup to get your domain back to the pre-backup layout.

+    </pre></p>


This means that virDomainBackupBegin will create a checkpoint, and libvirt
will have to create the temporary storage for the backup (.e.g disk for push
model, or temporary snapshot for the pull model). Libvirt will most likely
use
local storage which may fail if the host does not have enough local storage.

virDomainBackupBegin() has an optional <disks> XML element - if provided, then YOU can control the files (the destination on push model, ultimately including a remote network destination, such as via NBD, gluster, sheepdog, ...; or the scratch file for pull model, which probably only makes sense locally as the file gets thrown away as soon as the 3rd-party NBD client finishes). Libvirt only generates a filename if you don't provide that level of detail. You're right that the local storage running out of space can be a concern - but also remember that incremental backups are designed to be less invasive than full backups, AND that if one backup fails, you can then kick off another backup using the same checkpoint as starting point as the one that failed (that is, when libvirt is using B1 as its basis for a backup, but also created B2 at the same time, then you can use virDomainCheckpointDelete to remove B2 by merging the B1/B2 bitmaps back into B1, with B1 once again tracking changes from the previous successful backup to the current point in time).


But this may be good enough for many users, so maybe it is good to
have this.

I think we need to show here the more low level flow that oVirt will use:

Backup using external temporary storage
- virDomainFSFreeze()
- virtDomainCreateCheckpointXML()
- virDomainFSThaw()
- Here oVirt will need to query the checkpoints, to understand how much
   temporary storage is needed for the backup. I hope we have an API
  for this (did not read the next patches yet).

I have not exposed one so far, nor do I know if qemu has that easily available. But since it matters to you, we can make it a priority to add that (and the API would need to be added to libvirt.so at the same time as the other new APIs, whether or not I can make it in time for the freeze at the end of this week).

-  virDomainBackupBegin()
- third party copy data...
- virDomainBackeupEnd()

Again, note that oVirt will probably NOT call virDomainCreateCheckpointXML() directly, but will instead do:

virDomainFSFreeze();
virDomainBackupBegin(dom, "<domainbackup type='pull'/>", "<domaincheckpoint><name>B1</name></domaincheckpoint>", 0);
virDomainFSThaw();
third party copy data
virDomainBackupEnd();

for the first full backup, then for the next incremental backup, do:

virDomainFSFreeze();
virDomainBackupBegin(dom, "<domainbackup type='pull'><incremental>B1</incremental></domainbackup>", "domaincheckpoint><name>B2</name></domaincheckpoint>", 0);
virDomainFSThaw();
third party copy data
virDomainBackupEnd();

where you are creating bitmap B2 at the time of the first incremental backup (the second backup overall), and that backup consists of the data changed since the creation of bitmap B1 at the time of the earlier full backup.

Then, as I mentioned earlier, the minimal XML forces libvirt to generate filenames (which may or may not match what you want), so you can certainly pass in more verbose XML:

<domainbackup type='pull'>
  <incremental>B1</incremental>
  <server transport='unix' socket='/path/to/server'>
  <disks>
    <disk name='vda' type='block'>
      <scratch dev='/path/to/scratch/dev'>
    </disk>
  </disks>
</domainbackup>

and of course, we'll eventually want TLS thrown in the mix (my initial implementation has completely bypassed that, other than the fact that the <server> element is a great place to stick in the information needed for telling qemu's server to only accept clients that know the right TLS magic).

If this example helps, I can flush out the html to give these further insights.

And, if wrapping FSFreeze/Thaw is that common, we'll probably want to reach the point where we add VIR_DOMAIN_BACKUP_QUIESCE as a flag argument to automatically do it as part of virDomainBackupBegin().



This is great documentation, showing both the APIs and how they are
used together, we need more of this!

Well, and it's also been a great resource for me as I continue to hammer out the (LOADS) of code needed to reach a working demo.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux