Overview of libvirt incremental backup API, part 1 (full pull mode)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The following (long) email describes a portion of the work-flow of how my proposed incremental backup APIs will work, along with the backend QMP commands that each one executes. I will reply to this thread with further examples (the first example is long enough to be its own email). This is an update to a thread last posted here:
https://www.redhat.com/archives/libvir-list/2018-June/msg01066.html

I'm still pulling together pieces in the src/qemu directory of libvirt while implementing my proposed API, but you can track the status of my code (currently around 6000 lines of code and 1500 lines of documentation added) at:
https://repo.or.cz/libvirt/ericb.git
The documentation below describes the end goal of my demo (which I will be presenting at KVM Forum), even if the current git checkout of my work in progress doesn't quite behave that way.

My hope is that the API itself is in a stable enough state to include in the libvirt 4.9 release (end of this month - which really means upstream commit prior to KVM Forum!) by demo-ing how it is used with qemu experimental commands, even if the qemu driver portions of my series are not yet ready to be committed because they are waiting for the qemu side of incremental backups to stabilize. If we like the API and are willing to commit to it, then downstream vendors can backport whatever fixes in the qemu driver on top of the existing API without having to suffer from rebase barriers preventing the addition of new API.

Performing a full backup can work on any disk format, but incremental (all changes since the most recent checkpoint) and differential (all changes since an arbitrary earlier checkpoint) backups require the use of a persistent bitmap for tracking the changes between checkpoints, and that in turn requires a disk with qcow2 format. The API can handle multiple disks at the same point in time (so I'll demonstrate two at once), and is designed to handle both push model (qemu writes to a specific destination, and the format has to be one that qemu knows) and pull model (qemu opens up an NBD server for all disks, then you connect one or more read-only client per export on that server to read the information of interest into a destination of your choosing).

This demo also shows how I consume the data over a pull model backup. Remember, in the pull model, you don't have to use a qemu binary as the NBD client (you merely need a client that can request base:allocation and qemu:dirty-bitmap:name contexts) - it's just that it is easier to demonstrate everything with the tools already at hand. Thus, I use existing qemu-img 3.0 functionality to extract the dirty bitmap (the qemu:dirty-bitmap:name context) in one process, and a second qemu-io process (using base:allocation to optimize reads of holes) for extracting the actual data; the demo shows both processes accessing the read-only NBD server in parallel. While I use two processes, it is also feasible to write a single client that can get at both contexts through a single NBD connection (the qemu 3.0 server supports that, even if none of the qemu 3.0 clients can request multiple contexts). Down the road, we may further enhance tools shipped with qemu to be easier to use as such a client, but that does not affect the actual backup API (which is merely what it takes to get the NBD server up and running).

- Preliminary setup:
I'm using bash as my shell, and set

$ orig1=/path/to/disk1.img orig2=/path/to/disk2.img
$ dom=my_domain qemu_img=/path/to/qemu-img
$ virsh="/path/to/virsh -k 0"

to make later steps easier to type. While the steps below should work with qemu 3.0, I found it easier to test with both self-built qemu (modify the <emulator> line in my domain) and self-built libvirtd (systemctl stop libvirtd, then run src/libvrtd, also my use of $virsh with heartbeat disabled, so that I was able to attach gdb during development without having to worry about the connection dying). Also, you may need 'setenforce 0' when using self-built binaries, since otherwise SELinux labeling gets weird (obviously, when the actual code is ready to check into libvirt, it will work with SELinux enforcing and with system-installed rather than self-installed binaries). I also used:

$ $virsh domblklist $dom

to verify that I have plugged in $orig1 and $orig2 as two of the disks to $dom (I used:
    <disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source file='/path/to/disk1.img'/>
      <backingStore/>
      <target dev='sdc' bus='scsi'/>
    </disk>
    <disk type='file' device='disk'>
<driver name='qemu' type='qcow2' cache='none' error_policy='stop' io='native'/>
      <source file='/path/to/disk2.img'/>
      <backingStore/>
      <target dev='sdd' bus='scsi'/>
    </disk>
in my domain XML)

- First example: creating a full backup via pull model, initially with no checkpoint created
$ cat > backup.xml <<EOF
<domainbackup mode='pull'>
  <server transport='tcp' name='localhost' port='10809'/>
  <disks>
    <disk name='$orig1' type='file'>
      <scratch file='$PWD/scratch1.img'/>
    </disk>
    <disk name='sdd' type='file'>
      <scratch file='$PWD/scratch2.img'/>
    </disk>
  </disks>
</domainbackup>
EOF
$ $qemu_img create -f qcow2 -b $orig1 -F qcow2 scratch1.img
$ $qemu_img create -f qcow2 -b $orig2 -F qcow2 scratch2.img

Here, I'm explicitly requesting a pull backup (the API defaults to push otherwise), as well as explicitly requesting the NBD server to be set up (the XML should support both transport='tcp' and transport='unix'). Note that the <server> is global; but the server will support multiple export names at once, so that you can connect multiple clients to process those exports in parallel. Ideally, if <server> is omitted, libvirt should auto-generate an appropriate server name, and has a way for you to query what it generated (right now, I don't have that working in libvirt, so being explicit is necessary - but again, the goal now is to prove that the API is reasonable for including it in libvirt 4.9; enhancements like making <server> optional can come later even if they miss libvirt 4.9). I'm also requesting that the backup operate on only two disks of the domain, and pointing libvirt to the scratch storage it needs to use for the duration of the backup (ideally, libvirt will generate an appropriate scratch file name itself if omitted from the XML, and create scratch files itself instead of me having to pre-create them). Note that I can give either the path to my original disk ($orig1, $orig2) or the target name in the domain XML (in my case sdc, sdd); libvirt will normalize my input and always uses the target name when reposting the XML in output.

$ $virsh backup-begin $dom backup.xml
Backup id 1 started
backup used description from 'backup.xml'

Kicks off the backup job. virsh called
 virDomainBackupBegin(dom, "<domainbackup ...>", NULL, 0)
and in turn libvirt makes the all of following QMP calls (if any QMP call fails, libvirt attempts to unroll things so that there is no lasting change to the guest before actually reporting failure):
{"execute":"nbd-server-start",
 "arguments":{"addr":{"type":"inet",
  "data":{"host":"localhost", "port":"10809"}}}}
{"execute":"blockdev-add",
 "arguments":{"driver":"qcow2", "node-name":"backup-sdc",
  "file":{"driver":"file",
   "filename":"$PWD/scratch1.img"},
   "backing":"'$node1'"}}
{"execute":"blockdev-add",
 "arguments":{"driver":"qcow2", "node-name":"backup-sdd",
  "file":{"driver":"file",
   "filename":"$PWD/scratch2.img"},
   "backing":"'$node2'"}}
{"execute":"transaction",
 "arguments":{"actions":[
  {"type":"blockdev-backup", "data":{
   "device":"$node1", "target":"backup-sdc", "sync":"none",
   "job-id":"backup-sdc" }},
  {"type":"blockdev-backup", "data":{
   "device":"$node2", "target":"backup-sdd", "sync":"none",
   "job-id":"backup-sdd" }}
 ]}}
{"execute":"nbd-server-add",
 "arguments":{"device":"backup-sdc", "name":"sdc"}}
{"execute":"nbd-server-add",
 "arguments":{"device":"backup-sdd", "name":"sdd"}}

libvirt populated $node1 and $node2 to be the node names actually assigned by qemu; until Peter's work on libvirt using node names everywhere actually lands, libvirt is scraping the auto-generated #blockNNN name from query-block and friends (the same as it already does in other situations like write threshold).

With this command complete, libvirt has now kicked off a pull backup job, which includes single qemu NBD server, with two separate exports named 'sdc' and 'sdd' that expose the state of the disks at the time of the API call (any guest writes to $orig1 or $orig2 trigger copy-on-write actions into scratch1.img and scratch2.img to preserve the fact that reading from NBD sees unchanging contents).

We can double-check what libvirt is tracking for the running backup job, including the fact that libvirt normalized the <disk> names to match the domain XML target listings, and matching the names of the exports being served over the NBD server:

$ $virsh backup-dumpxml $dom 1
<domainbackup type='pull' id='1'>
  <server transport='tcp' name='localhost' port='10809'/>
  <disks>
    <disk name='sdc' type='file'>
      <scratch file='/home/eblake/scratch1.img'/>
    </disk>
    <disk name='sdd' type='file'>
      <scratch file='/home/eblake/scratch2.img'/>
    </disk>
  </disks>
</domainbackup>

where 1 on the command line would be replaced by whatever id was printed by the earlier backup-begin command (yes, my demo can hard-code things to 1, because the current qemu and initial libvirt implementations only support one backup job at a time, although we have plans to allow parallel jobs in the future).

This translated to the libvirt API call
 virDomainBackupGetXMLDesc(dom, 1, 0)
and did not have to make any QMP calls into qemu.

Now that the backup job is running, we want to scrape the data off the NBD server. The most naive way is:

$ $qemu_img convert -f raw -O $fmt nbd://localhost:10809/sdc full1.img
$ $qemu_img convert -f raw -O $fmt nbd://localhost:10809/sdd full2.img

where we hope that qemu-img convert is able to recognize the holes in the source and only write into the backup copy where actual data lives. You don't have to uses qemu-img; it's possible to use any NBD client, such as the kernel NBD module:

$ modprobe nbd
$ qemu-nbd -c /dev/nbd0 -f raw nbd://localhost:10809/sdc
$ cp /dev/nbd0 full1.img
$ qemu-nbd -d /dev/nbd0

The above demonstrates the flexibility of the pull model (your backup file can be ANY format you choose; here I did 'cp' to copy it to a raw destination), but it was also a less efficient NBD client, since the kernel module doesn't yet know about NBD_CMD_BLOCK_STATUS for learning where the holes are, nor about NBD_OPT_STRUCTURED_REPLY for faster reads of those holes.

Of course, we don't have to blindly read the entire image, but can instead use two clients in parallel (per exported disk), one which is using 'qemu-img map' to learn which parts of the export contain data, then feeds it through a bash 'while read' loop to parse out which offsets contain interesting data, and spawning a second client per region to copy just that subset of the file. Here, I'll use 'qemu-io -C' to perform copy-on-read - that requires that my output file be qcow2 rather than any other particular format, but I'm guaranteed that my output backup file is only populated in the same places that $orig1 was populated at the time the backup started.

$ $qemu_img create -f qcow2 full1.img $size_of_orig1
$ $qemu_img rebase -u -f qcow2 -F raw -b nbd://localhost:10809/sdc \
  full1.img
$ while read line; do
  [[ $line =~ .*start.:.([0-9]*).*length.:.([0-9]*).*data.:.true.* ]] ||
    continue
  start=${BASH_REMATCH[1]} len=${BASH_REMATCH[2]}
  qemu-io -C -c "r $start $len" -f qcow2 full1.img
done < <($qemu_img map --output=json -f raw nbd://localhost:10809/sdc)
$ $qemu_img rebase -u -f qcow2 -b '' full1.img

and the nice thing about this loop is that once you've figured out how to parse qemu-img map output as one client process, you can use any other process (such as qemu-nbd -c, then dd if=/dev/nbd0 of=$dest bs=64k skip=$((start/64/1024)) seek=$((start/64/1024)) count=$((len/64/1024)) conv=fdatasync) as the NBD client that reads the subset of data of interest (and thus, while qemu-io had to write to full1.img as qcow2, you can use an alternative client to write to raw or any other format of your choosing).

Now that we've copied off the full backup image (or just a portion of it - after all, this is a pull model where we are in charge of how much data we want to read), it's time to tell libvirt that it can conclude the backup job:

$ $virsh backup-end $dom 1
Backup id 1 completed

again, where the command line '1' came from the output to backup-begin and could change to something else rather than being hard-coded in the demo. This maps to the libvirt API call
 virDomainBackupEnd(dom, 1, 0)
which in turn maps to the QMP commands:
{"execute":"nbd-server-remove",
 "arguments":{"name":"sdc"}}
{"execute":"nbd-server-remove",
 "arguments":{"name":"sdd"}}
{"execute":"nbd-server-stop"}
{"execute":"block-job-cancel",
 "arguments":{"device":"sdc"}}
{"execute":"block-job-cancel",
 "arguments":{"device":"sdd"}}
{"execute":"blockdev-del",
 "arguments":{"node-name":"backup-sdc"}}
{"execute":"blockdev-del",
 "arguments":{"node-name":"backup-sdd"}}

to clean up all the things added during backup-begin.

More to come in part 2.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux