Re: [Qemu-devel] [RFC] live snapshot, live merge, live block migration

Jagane Sundar <jagane@xxxxxxxxxx> · Mon, 16 May 2011 01:23:35 -0700

Hello Dor,

Let me see if I understand live snapshot correctly:
If I want to configure a VM for daily backup, then I would do
the following:
- Create a snapshot s1. s0 is marked read-only.
- Do a full backup of s0 on day 0.
- On day 1, I would create a new snapshot s2, then
  copy over the snapshot s1, which is the incremental
   backup image from s0 to s1.
- After copying s1 over, I do not need that snapshot, so
  I would live merge s1 with s0, to create a new merged
  read-only image s1'.
- On day 2, I would create a new snapshot s3, then
   copy over s2, which is the incremental backup from
   s1' to s2
- And so on...

With this sequence of operations, I would need to keep a
snapshot active at all times, in order to enable the
incremental backup capability, right?

If the base image is s0 and there is a single snapshot s1, then a
read operation from the VM will first look in s1. if the block is
not present in s1, then it will read the block from s0, right?
So most reads from the VM will effectively translate into two
reads, right?

Isn't this a continuous performance penalty for the VM,
amounting to almost doubling the read I/O from the VM?

Please read below for more comments:
2. Robustness of this solution in the face of
errors in the disk, etc. If any one of the snapshot
files were to get corrupted, the whole VM is
adversely impacted.
Since the base images and any snapshot which is not a leaf is marked as
read only there is no such risk.

What happens when a VM host reboots while a live merge of s0
and s1 is being done?
The primary goal of Livebackup architecture was to have zero
performance impact on the running VM.

Livebackup impacts performance of the VM only when the
backup client connects to qemu to transfer the modified
blocks over, which should be, say 15 minutes a day, for a
daily backup schedule VM.
In case there were lots of changing for example additional 50GB changes
it will take more time and there will be a performance hit.

Of course, the performance hit is proportional to the amount of data
being copied over. However, the performance penalty is paid during
the backup operation, and not during normal VM operation.

One useful thing to do is to evaluate the important use cases
for this technology, and then decide which approach makes
most sense. As an example, let me state this use case:
- A IaaS cloud, where VMs are always on, running off of a local
disk, and need to be backed up once a day or so.

Can you list some of the other use cases that live snapshot and
live merge were designed to solve. Perhaps we can put up a
single wiki page that describes all of these proposals.
Both solutions can serve for the same scenario:
With live snapshot the backup is done the following:

1. Take a live snapshot (s1) of image s0.
2. Newer writes goes to the snapshot s1 while s0 is read only.
3. Backup software processes s0 image.
     There are multiple ways for doing that -
     1. Use qemu-img and get the dirty blocks from former backup.
        - Currently qemu-img does not support it.
        - Nevertheless, such mechanism will work for lvm, btrfs, NetApp
     2. Mount the s0 image to another guest that runs traditional backup
        software at the file system level and let it do the backup.
4. Live merge s1->s0
     We'll use live copy for that so each write is duplicated (like your
     live backup solution).
5. Delete s1

As you can see, both approaches are very similar, while live snapshot is
more general and not tied to backup specifically.

As I explained at the head of this email, I believe that live snapshot
results in the VM read I/O paying a high penalty during normal operation
of the VM, whereas Livebackup results in this penalty being paid only
during the backup dirty block transfer operation.

Finally, I would like to bring up considerations of disk space. To expand on
my use case further, consider a Cloud Compute service with 100 VMs
running on a host. If live snapshot is used to create snapshot COW files,
then potentially each VM could grow the COW snapshot file to the size
of the base file, which means the VM host needs to reserve space for
the snapshot that equals the size of the VMs - i.e. a 8GB VM would
require an additional 8GB of space to be reserved for the snapshot,
so that the service provider could safely guarantee that the snapshot
will not run out of space.
Contrast this with livebackup, wherein the COW files are kept only when
the dirty block transfers are being done. This means that for a host with
100 VMs, if the backup server is connecting to each of the 100 qemu's
one by one and doing a livebackup, the service provider would need
to provision spare disk for at most the COW size of one VM.

Thanks,
Jagane

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html