On 6/15/20 12:10 PM, Peter Krempa wrote:
Outline the basics and how to integrate with externally created
overlays. Other topics will continue later.
Signed-off-by: Peter Krempa <pkrempa@xxxxxxxxxx>
---
docs/kbase.html.in | 3 +
docs/kbase/incrementalbackupinternals.rst | 210 ++++++++++++++++++++++
2 files changed, 213 insertions(+)
create mode 100644 docs/kbase/incrementalbackupinternals.rst
+++ b/docs/kbase/incrementalbackupinternals.rst
@@ -0,0 +1,210 @@
+================================================
+Internals of incremental backup handling in qemu
+================================================
+
+.. contents::
+
+Libvirt's implementation of incremental backups in the ``qemu`` driver uses
+qemu's ``block-dirty-bitmaps`` under the hood to track the guest visible disk
+state changes correspoiding to the points in time described by a libvirt
corresponding
+checkpoint.
+
+There are some semantical implications how libvirt creates and manages the
semantic implications with how
+bitmaps which de-facto become API as they are written into the disk images and
images,
+this document will try to sumarize them.
summarize
+
+Glossary
+========
+
+Checkpoint
+
+ A libvirt object which represents a named point in time of the life of the
+ vm where libvirt tracks writes the VM has done and allows then a backup of
has done, thereby allowing a backup of only the blocks which changed
+ block which changed. Note that state of the VM memory is _not_ captured.
+
+ A checkpoint can be created either explicitly via the corresponding API
+ which isn't very useful or is created as part of creating an
+ incremental or full backup of the VM using the ``virDomainBackupBegin`` API
+ which allows a next backup to only copy the differences.
Maybe:
A checkpoint can be created either explicitly via the corresponding API
(although this isn't very useful on its own), or simultaneously with an
incremental or full backup of the VM
+
+Backup
+
+ A copy of either all blocks of selected disks (full backup) or blocks changed
+ since a checkpoint (incremental backup) at the time the backup job was
+ started. (Blocks modified while the backup job is running are not part of the
+ backup!)
+
+Snapshot
+
+ Similarly to a checkpoint it's a point in time in the lifecycle of the VM
+ but the state of the VM including memory is captured at that point allowing
+ returning to the state later.
Hmm. We have disk-only snapshots which do not save the state of memory.
Does this paragraph need adjustment to mention the difference between a
disk-only snapshot and a full state capture? Are we redefining any of
the terms in domainstatecapture.rst, and/or should those two documents
have cross-references?
+
+Blockjob
+
+ A long running job which modifies the shape and/or location of the disk
+ backing chain (images storing the disk contents). Libvirt supports
If qemu adds block-dirty-bitmap-populate, blockjobs can also manipulate
just bitmaps.
+ ``block pull`` where data is moved up the chain towards the active layer,
+ ``block commit`` where data is moved down the chain towards the base/oldest
+ image. These blockjobs always remove images from the backing chain. Lastly
+ ``block copy`` where image is moved to a different location (and possibly
+ collapsed moving all of the data into the new location into the one image).
+
+block-dirty-bitmap (bitmap)
+
+ A data structure in qemu tracking which blocks were written by the guest
+ OS since the bitmap was created.
+
+Relationships of bitmaps, checkpoints and VM disks
+==================================================
+
+When a checkpoint is created libvirt creates a block-dirty-bitmap for every
+configured VM disk named the same way as chcheckpoint. The bitmap is actively
s/chcheckpoint/the checkpoint/
+recording which blocks were changed by the guest OS from that point on. Other
+bitmaps are not impacted by any way as they are self-contained:
+
+::
+
+ +----------------+ +----------------+
+ | disk: vda | | disk: vdb |
+ +--------+-------+ +--------+-------+
+ | |
+ +--------v-------+ +--------v-------+
+ | vda-1.qcow2 | | vdb-1.qcow2 |
+ | | | |
+ | bitmaps: chk-a | | bitmaps: chk-a |
+ | chk-b | | chk-b |
+ | | | |
+ +----------------+ +----------------+
+
+Bitmaps are created at the same time to track changes to all disks in sync and
+are active and persisted in the QCOW2 image. Oter formats currently don't
Other
+support this feature.
+
+Modification of bitmaps outside of libvirt is not recommended, but when adrering
adhering
+to the same semantics which the document will describe it should be safe to do
+so but obviously we can't guarantee that.
do so, even if we obviously can't guarantee that
+
+
+Integration with external snapshots
+===================================
+
+Handling of bitmaps
+-------------------
+
+Creating an external snapshot involves adding a new layer to the backing chain
+on top of the previous chain. In this step there are no new bitmaps created by
+default, which would mean that backups become impossible after this step.
+
+To prevent this from happening we need to re-create the active bitmaps in the
+new top/active layer of the backing chain which allows us to continue tracking
+the changes with same granularity as before and also allows libvirt to stitch
+together all the corresponding bitmaps to do a backup acorss snapshots.
across
+
+After taking a snapshot of the ``vda`` disk from the example above placed into
+``vda-2.qcow2`` the following topology will be created:
+
+::
+
+ +----------------+
+ | disk: vda |
+ +-------+--------+
+ |
+ +-------v--------+ +----------------+
+ | vda-2.qcow2 | | vda-1.qcow2 |
+ | | | |
+ | bitmaps: chk-a +----> bitmaps: chk-a |
+ | chk-b | | chk-b |
+ | | | |
+ +----------------+ +----------------+
+
+Checking bitmap health
+----------------------
+
+QEMU optimizes disk writes by only updating the bitmaps in certain cases. This
+also can cause problems in cases when e.g. QEMU crashes.
+
+For a chain of bitmaps corresponding in a backing chain to be considered valid
corresponding bitmaps
+and eligible for use with ``virDomainBackupBegin`` it must conform to the
+following rules:
+
+1) Top image must contain the bitmap
+2) If any of the backing images in the chain contain the bitmap too all
too,
+ contiguous images must have the bitmap (no gaps)
+3) all of the above bitmaps must be marked as active
+ (``auto`` flag in ``qemu-img`` output, ``recording`` in qemu)
+4) none of the above bitmaps can be inconsistent
+ (``in-use`` flag in ``qemu-img`` provided that it's not used on image which
+ is currently in use by a qemu instance, or ``inconsistent`` in qemu)
+
+::
+
+ # check that image has bitmaps
+ $ qemu-img info vda-1.qcow2
+ image: vda-1.qcow2
+ file format: qcow2
+ virtual size: 100 MiB (104857600 bytes)
+ disk size: 220 KiB
+ cluster_size: 65536
+ Format specific information:
+ compat: 1.1
+ compression type: zlib
+ lazy refcounts: false
+ bitmaps:
+ [0]:
+ flags:
+ [0]: in-use
+ [1]: auto
+ name: chk-a
+ granularity: 65536
+ [1]:
+ flags:
+ [0]: auto
+ name: chk-b
+ granularity: 65536
+ refcount bits: 16
+ corrupt: false
+
+(See also the ``qemuBlockBitmapChainIsValid`` helper method in
+``src/qemu/qemu_block.c``)
+
+Creating external checkpoints manually
s/checkpoints/snapshots/
+--------------------------------------
+
+To create the same topology outside of libvirt (e.g when doing snapshots offline)
+a new ``qemu-img`` which supports the ``bitmap`` subcomand is necessary. The
subcommand
s/necessary/recommended/ (as it is also possible to use 'qemu-kvm -S' to
do the same actions via QMP commands - although I'm not sure if it is
worth documenting that fallback)
+following algorithm then ensures that the new image after snapshot will work
+with backups (note that ``jq`` is a JSON processor):
+
+::
+
+ # arguments
+ SNAP_IMG="vda-2.qcow2"
+ BACKING_IMG="vda-1.qcow2"
+
+ # constants - snapshots and bitmaps work only with qcow2
+ SNAP_FMT="qcow2"
+ BACKING_IMG_FMT="qcow2"
+
+ # create snapshot overlay
+ qemu-img create -f "$SNAP_FMT" -F "$BACKING_IMG_FMT" -b "$BACKING_IMG" "$SNAP_IMG"
+
+ BACKING_IMG_INFO=$(qemu-img info --output=json -f "$BACKING_IMG_FMT" "$BACKING_IMG")
+ BACKING_BITMAPS=$(jq '."format-specific".data.bitmaps' <<< "$BACKING_IMG_INFO")
<<< is a bashism.
+
+ if [ "x$BACKING_BITMAPS" == "xnull" ]; then
So is == instead of =. Either we should tweak this to be portable to
dash, or you should add a #!/bin/bash line to the top of the example.
+ exit 0
+ fi
+
+ for BACKING_BITMAP_ in $(jq -c '.[]' <<< "$BACKING_BITMAPS"); do
+ BITMAP_FLAGS=$(jq -c -r '.flags[]' <<< "$BACKING_BITMAP_")
+ BITMAP_NAME=$(jq -r '.name' <<< "$BACKING_BITMAP_")
+
+ if grep 'in-use' <<< "$BITMAP_FLAGS" ||
+ grep -v 'auto' <<< "$BITMAP_FLAGS"; then
+ continue
+ fi
+
+ qemu-img bitmap -f "$SNAP_FMT" "$SNAP_IMG" --add "$BITMAP_NAME"
Do you want to also copy the --granularity of the bitmaps being added?
+
+ done
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org