Re: [PATCH 4/8] backup: Document new XML for backups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/26/2018 02:51 PM, Nir Soffer wrote:
On Wed, Jun 13, 2018 at 7:42 PM Eric Blake <eblake@xxxxxxxxxx> wrote:

Prepare for new checkpoint and backup APIs by describing the XML
that will represent a checkpoint.  This is modeled heavily after
the XML for virDomainSnapshotPtr, since both represent a point in
time of the guest.  But while a snapshot exists with the intent
of rolling back to that state, a checkpoint instead makes it
possible to create an incremental backup at a later time.

Add testsuite coverage of a minimal use of the XML.

+++ b/docs/formatcheckpoint.html.in
@@ -0,0 +1,273 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html>
+<html xmlns="http://www.w3.org/1999/xhtml";>
+  <body>
+    <h1>Checkpoint and Backup XML format</h1>
+
+    <ul id="toc"></ul>
+
+    <h2><a id="CheckpointAttributes">Checkpoint XML</a></h2>


id=CheckpointXML?

Matches what the existing formatsnapshot.html.in named its tag. (If you haven't guessed, I'm heavily relying on snapshots as my template for adding this).



+
+    <p>
+      Domain disk backups, including incremental backups, are one form
+      of <a href="domainstatecapture.html">domain state capture</a>.
+    </p>
+    <p>
+      Libvirt is able to facilitate incremental backups by tracking
+      disk checkpoints, or points in time against which it is easy to
+      compute which portion of the disk has changed.  Given a full
+      backup (a backup created from the creation of the disk to a
+      given point in time, coupled with the creation of a disk
+      checkpoint at that time),


Not clear.


and an incremental backup (a backup
+      created from just the dirty portion of the disk between the
+      first checkpoint and the second backup operation),


Also not clear.

Okay, I will try to improve these in v2. But (other than answering these good review emails), my current priority is a working demo (to prove the API works) prior to further doc polish.



it is
+      possible to do an offline reconstruction of the state of the
+      disk at the time of the second backup, without having to copy as
+      much data as a second full backup would require.  Most disk
+      checkpoints are created in concert with a backup,
+      via <code>virDomainBackupBegin()</code>; however, libvirt also
+      exposes enough support to create disk checkpoints independently
+      from a backup operation,
+      via <code>virDomainCheckpointCreateXML()</code>.


Thanks for the extra context.


+    </p>
+    <p>
+      Attributes of libvirt checkpoints are stored as child elements of
+      the <code>domaincheckpoint</code> element.  At checkpoint creation
+      time, normally only the <code>name</code>, <code>description</code>,
+      and <code>disks</code> elements are settable; the rest of the
+      fields are ignored on creation, and will be filled in by
+      libvirt in for informational purposes


So the user is responsible for creating checkpoints names? Do we use these
the same name in qcow2?

My intent is that if the user does not assign a checkpoint name, then libvirt will default it to the current time in seconds-since-the-Epoch. Then, whatever name is given to the checkpoint (whether chosen by libvirt or assigned by the user) will also be the default name of the bitmap created in each qcow2 volume, but the XML also allows you to name the qcow2 bitmaps something different than the checkpoint name (maybe not a wise idea in the common case, but could come in handy later if you use the _REDEFINE flag to teach libvirt about existing bitmaps that are already present in a qcow2 image rather than placed there by libvirt).

+    <p>
+      Checkpoints are maintained in a hierarchy.  A domain can have a
+      current checkpoint, which is the most recent checkpoint compared to
+      the current state of the domain (although a domain might have
+      checkpoints without a current checkpoint, if checkpoints have been
+      deleted in the meantime).  Creating or reverting to a checkpoint
+      sets that checkpoint as current, and the prior current checkpoint is
+      the parent of the new checkpoint.  Branches in the hierarchy can
+      be formed by reverting to a checkpoint with a child, then creating
+      another checkpoint.


This seems too complex. Why do we need arbitrary trees of checkpoints?

Because snapshots had an arbitrary tree, and it was easier to copy from snapshots. Even if we only use a linear tree for now, it is still feasible that in the future, we can facilitate a domain rolling back to the disk state as captured at checkpoint C1, at which point you could then have multiple children C2 (the bitmap created prior to rolling back) and C3 (the bitmap created for tracking changes made after rolling back). Again, for a first cut, I probably will punt and state that snapshots and incremental backups do not play well together yet; but as we get experience and add more code, the API is flexible enough that down the road we really can offer reverting to an arbitrary snapshot and ALSO updating checkpoints to match.


What is the meaning of reverting a checkpoint?

Hmm - right now, you can't (that was one Snapshot API that I intentionally did not copy over to Checkpoint), so I should probably reword that.



+    </p>
+    <p>
+      The top-level <code>domaincheckpoint</code> element may contain
+      the following elements:
+    </p>
+    <dl>
+      <dt><code>name</code></dt>
+      <dd>The name for this checkpoint.  If the name is specified when
+        initially creating the checkpoint, then the checkpoint will have
+        that particular name.  If the name is omitted when initially
+        creating the checkpoint, then libvirt will make up a name for
+        the checkpoint, based on the time when it was created.
+      </dd>


Why not simplify and require the use to provide a name?

Because we didn't require the user to provide names for snapshots, and generating a name via the current timestamp is still fairly likely to be usable.



+      <dt><code>description</code></dt>
+      <dd>A human-readable description of the checkpoint.  If the
+        description is omitted when initially creating the checkpoint,
+        then this field will be empty.
+      </dd>
+      <dt><code>disks</code></dt>
+      <dd>On input, this is an optional listing of specific
+        instructions for disk checkpoints; it is needed when making a
+        checkpoint on only a subset of the disks associated with a
+        domain (in particular, since qemu checkpoints require qcow2
+        disks, this element may be needed on input for excluding guest
+        disks that are not in qcow2 format); if omitted on input, then
+        all disks participate in the checkpoint.  On output, this is
+        fully populated to show the state of each disk in the
+        checkpoint.  This element has a list of <code>disk</code>
+        sub-elements, describing anywhere from one to all of the disks
+        associated with the domain.


Why not always specify the disks?

Because if your guest uses all qcow2 images, and you don't want to exclude any images from the checkpoint, then not specifying <disks> does the right thing with less typing. Just because libvirt tries to have sane defaults doesn't mean you have to rely on them, though.



+        <dl>
+          <dt><code>disk</code></dt>
+          <dd>This sub-element describes the checkpoint properties of
+            a specific disk.  The attribute <code>name</code> is
+            mandatory, and must match either the <code>&lt;target
+            dev='name'/&gt;</code> or an unambiguous <code>&lt;source
+            file='name'/&gt;</code> of one of
+            the <a href="formatdomain.html#elementsDisks">disk
+            devices</a> specified for the domain at the time of the
+            checkpoint.  The attribute <code>checkpoint</code> is
+            optional on input; possible values are <code>no</code>
+            when the disk does not participate in this checkpoint;
+            or <code>bitmap</code> if the disk will track all changes
+            since the creation of this checkpoint via a bitmap, in
+            which case another attribute <code>bitmap</code> will be
+            the name of the tracking bitmap (defaulting to the
+            checkpoint name).


Seems too complicated. Why do we need to support a checkpoint
referencing a bitmap with a different name?

For the same reason that you can support an internal snapshot referencing a qcow2 snapshot with a different name. Yeah, it's probably not a common usage, but there are cases (such as when using _REDEFINE) where it can prove invaluable. You're right that most users won't name qcow2 bitmaps differently from the libvirt checkpoint name.


Instead we can have a list of disk that will participate in the checkpoint.
Anything not specified will not participate in the snapshot. The name of
the checkpoint is always the name of the bitmap.

My worry is about future extensibility of the XML. If the XML is too simple, then we may lock ourselves into a corner at not being able to support some other backend implementation of checkpoints (just because qemu implements checkpoints via qcow2 bitmaps does not mean that some other hyperviser won't come along that implements checkpoints via a UUID, so I tried to leave room for <disk checkpoint='uuid' uuid='..-..-...'/> as potential XML for such a hypervisor mapping - and while bitmap names different from checkpoint names is unusual, it is much more likely that UUIDs for multiple disks would have to be different per disk)



+          </dd>
+        </dl>
+      </dd>
+      <dt><code>creationTime</code></dt>
+      <dd>The time this checkpoint was created.  The time is specified
+        in seconds since the Epoch, UTC (i.e. Unix time).  Readonly.
+      </dd>
+      <dt><code>parent</code></dt>
+      <dd>The parent of this checkpoint.  If present, this element
+        contains exactly one child element, name.  This specifies the
+        name of the parent checkpoint of this one, and is used to
+        represent trees of checkpoints.  Readonly.
+      </dd>


I think we are missing here the size of the underlying data in every
disk. This probably means how many dirty bits we have in the bitmaps
referenced by the checkpoint for every disk.

That would be an output-only XML element, and only if qemu were even modified to expose that information. But yes, I can see how exposing that could be useful.



+      <dt><code>domain</code></dt>
+      <dd>The inactive <a href="formatdomain.html">domain
+        configuration</a> at the time the checkpoint was created.
+        Readonly.


What do you mean by "inactive domain configuration"?

Copy-and-paste from snapshots, but in general, what it would take to start a new VM using a restoration of the backup images corresponding to that checkpoint (that is, the XML is the smaller persistent form, rather than the larger running form; my classic example used to be that the 'inactive domain configuration' omits <alias> tags while the 'running configuration' does not - but since libvirt recently added support for user-settable <alias> tags, that no longer holds...).


+    <dl>
+      <dt><code>incremental</code></dt>
+      <dd>Optional. If this element is present, it must name an
+        existing checkpoint of the domain, which will be used to make
+        this backup an incremental one (in the push model, only
+        changes since the checkpoint are written to the destination;
+        in the pull model, the NBD server uses the
+        NBD_OPT_SET_META_CONTEXT extension to advertise to the client
+        which portions of the export contain changes since the
+        checkpoint).  If omitted, a full backup is performed.


Just to make it clear:

For example we start with:

     c1 c2 [c3]

c3 is the active checkpoint.

We create a new checkpoint:

     c1 c2 c3 [c4]

So
- using incremental=c2, we will get data referenced by c2?

Your incremental backup would get all changes since the point in time c2 was created (that is, the changes recorded by the merge of bitmaps c2 and c3).

- using incremental=c1, we will get data reference by both c1 and c2?

Your incremental backup would get all changes since the point in time c1 was created (that is, the changes recorded by the merge of bitmaps c1, c2, and c3).


What if we want to backup only data from c1 to c2, not including c3?

Qemu can't do that right now, so this API can't do it either. Maybe there's a way to add it into the API (and the fact that we used XML leaves that door wide open), but not right now.

I don't have a use case for this, but if we can specify tow checkpoints
this can be possible.

For example:

     <chekpoints from="c1",  to="c2">

Or

     <checkpoints from="c2">

Or the current proposal of <incremental> serves as the 'from', and a new sibling element <limit> becomes the 'to', if it becomes possible to limit a backup to an earlier point in time than the present call to the API.



+      </dd>
+      <dt><code>server</code></dt>
+      <dd>Present only for a pull mode backup.  Contains the same
+       attributes as the <code>protocol</code> element of a disk
+       attached via NBD in the domain (such as transport, socket,
+       name, port, or tls), necessary to set up an NBD server that
+       exposes the content of each disk at the time the backup
+       started.
+      </dd>


To get the list of changed blocks, we planned to use something like:

     qemu-img map nbd+unix:/socket=server.sock

Is this possible now? planned?

Possible via the x-nbd-server-add-bitmap command added in qemu commit 767f0c7, coupled with a client that knows how to request NBD_OPT_SET_META_CONTEXT "qemu:dirty-bitmap:foo" then read the bitmap with NBD_CMD_BLOCK_STATUS (I have a hack patch sitting on the qemu list that lets qemu-img behave as such a client: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg05993.html)


To get the actual data, oVirt needs a device to read from. We don't want
to write nbd-client, and we cannot use qemu-img since it does not support
streaming data, and we want to stream data using http to the backup
application.

I guess we will have do this:

    qemu-nbd -c /dev/nbd0 nbd+unix:/socket=server.sock

And serve the data from /dev/nbd0.

Yes, except that the kernel NBD client plugin does not have support for NBD_CMD_BLOCK_STATUS, so reading /dev/nbd0 won't be able to find the dirty blocks. But you could always do it in two steps: first, connect a client that only reads the bitmap (such as qemu-img with my hack), then connect the kernel client so that you can stream just the portions of /dev/nbd0 referenced in the map of the first step. (Or, since both clients would be read-only, you can have them both connected to the qemu server at once)



+      <dt><code>disks</code></dt>
+      <dd>This is an optional listing of instructions for disks
+        participating in the backup (if omitted, all disks
+        participate, and libvirt attempts to generate filenames by
+        appending the current timestamp as a suffix). When provided on
+        input, disks omitted from the list do not participate in the
+        backup.  On output, the list is present but contains only the
+        disks participating in the backup job.  This element has a
+        list of <code>disk</code> sub-elements, describing anywhere
+        from one to all of the disks associated with the domain.
+        <dl>
+          <dt><code>disk</code></dt>
+          <dd>This sub-element describes the checkpoint properties of
+            a specific disk.  The attribute <code>name</code> is
+            mandatory, and must match either the <code>&lt;target
+            dev='name'/&gt;</code> or an unambiguous <code>&lt;source
+            file='name'/&gt;</code> of one of
+            the <a href="formatdomain.html#elementsDisks">disk
+            devices</a> specified for the domain at the time of the
+            checkpoint.  The optional attribute <code>type</code> can
+            be <code>file</code>, <code>block</code>,
+            or <code>networks</code>, similar to a disk declaration
+            for a domain, controls what additional sub-elements are
+            needed to describe the destination (such
+            as <code>protocol</code> for a network destination).  In
+            push mode backups, the primary subelement
+            is <code>target</code>; in pull mode, the primary sublement
+            is <code>scratch</code>; but either way,
+            the primary sub-element describes the file name to be used
+            during the backup operation, similar to
+            the <code>source</code> sub-element of a domain disk. An
+            optional sublement <code>driver</code> can also be used to
+            specify a destination format different from qcow2.


This should be similar to the way we specify disks for vm, right?
Anything that works as vm disk will work for pushing backups?

Ultimately, yes, I'd like to support gluster/NBD/sheepdog/... destinations. My initial implementation is less ambitious, and supports just local files (because those are easier to test and therefore produce a demo with).

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux