Re: RFC: revert to external snapshot API

Peter Krempa <pkrempa@xxxxxxxxxx> · Wed, 8 Sep 2021 11:27:43 +0200

On Tue, Aug 31, 2021 at 16:46:25 +0300, Nikolay Shirokovskiy wrote:
> Hi, all.

Hi, sorry for the late reply I was on PTO.

> I want to implement reverting to external snapshot functionality.
> I've checked the mailing list history and found some previous attempts
> (not sure if this is a complete list of them).
> 
> [1] was done in 2012 by the Redhat team itself. [2] and [3] was done in
> 2018.
> Looks like it was not clear how the API should look like.

One additional thing is that me and phrdina started discussing this (in
person so I can't point you to a discussion) 2 weeks ago.

I'll summarize the points we agreed upon.

> For example we have disk.qcow2 <- disk.snap1 chain <- disk.snap2 chain
> (after
> having snap1 and snap2 snapshots). Now we want to revert to snap1 snapshot.

There's one implementation snag we currently have which complicates
stuff. Let's expand your above scenario with the snapshot states:

                           s                      s
                           n                      n
                           a                      a
                           p                      p
                           1                      2
                                                                          p
o                          │                      │                       r
r    base.qcow2            │    snap1.qcow2       │    snap2.qcow2        e
i ───────────────────────► │ ────────────────────►│ ────────────────────► s
g                          │                      │                       e
i                          │                      │                       n
n                                                                         t

A rather big set of problems which we will necessarily encounter when
implementing this comes from the following:

1) users can modify the VM betwen snapshots or between the last snapshot
and the abandoned state and this modification (e.g. changing a disk
image) must be considered to prevent data loss when manipulating the
images.

2) libvirt doesn't record the XML used to create the snapshot (e.g.
snap1.xm) thus we don't actually know what the (historical) snapshot was
actually recording. Coupled with the fact that the user could have
changed the VM definition between 'snap1' and 'snap2' we can't even
infer what the state was.

> The
> snapshot state is held by disk.qcow2 image. We can run reverted domain on
> disk.qcow2 itself but then we lose snap1 (held by disk.qcow2) and snap2
> (held by disk.snap1). So we definitely should run on some overlay over
> disk.qcow2. But then what name should be given to overlay? We should have
> an option for mgmt to specify this name like in case of snapshots itself.

Exactly. Reversion of external snapshots will necessarily require a new
API, which will take a new "snapshot" XML describing the new overlays as
you describe below.

In the simple case such as with local files we can use the same
algorithm for creating overlay filenames as we do when creating
snapshots but generally we need to give the MGMT the ability to specify
the overlay name.

> The [1] has some discussion on adding extra options to reverting
> functionality.
> Like for example if we revert back to snap2 then having the ability to run
> from
> state in disk.snap2 rather than disk.snap1. My opinion is there is no need
> to
> as if one wants to revert to the state of disk2.snap2 it can take a
> snapshot (some
> snap3).

It's possible to avoid doing a combined "take snapshot&revert" operation
as long as we have the possibility to take a snapshot and destroy the
VM as running it after the point you want to return to is undesirable
and pointless.

One thing that you need to consider here is that when you are reverting
to an arbitrary snapshot, the overlay files after the last snapshot
(e.g. snap2.qcow2 (in my diagram), which is the state between snap2 and
present) are abandoned and will never be used again.

At this point we disucssed that we should be removing those as
semantically it's the only point where we can do that as we know the
state of the VM which will be abandoned.

The problem lies in the fact that between 'snap2' and present the user
could have exchanged disks and then we don't have enough metadata to
create the overlays.

One big additional caveat is that if the user exchanged disk images we
_must not_ delete the changed image, so the metadata wrangling might be
non-trivial in this case.

> At the same time one needs to be aware that revert operation loses
> current state and later one can revert only to the states of snapshots.
> This is the way internal snapshots work and the way one expects external
> snapshots to work too.

That is an acceptable caveat for the user. As noted above as long as you
can take a snapshot & atomicaly destroy the VM it's acceptable.

For libvirt it's harder a bit as described above especially if we don't
want to keep litering the disk with unused and invalid images.

> The [2] takes an approach of reusing current top image as overlay on revert
> so
> that in the above example disk.snap2 will be overlay over disk.qcow2 on
> reverting to snap1 snapshot. IMHO this is a confusing naming scheme.

No in our discussion with Pavel  we've ruled out all special cases as
too much hassle to implement with little benefit.

If a user wants to rever&abandon some state, they can do 2 operations of
reverting and then deleting the unwanted snapshots.

> > The [3] suggests a different scheme for naming images. For example after
> taking
> snapshot snap1 the chain should be disk.snap1 <- disk.qcow2 which looks very
> appealing to me. With this naming using the [2] approach is quite natural.

IMO we can't really change the naming any more, users baked scripts on
top of this. Using a flag is also mostly pointless as users will keep
creating snapshots without it. Thus I don't think this is a viable
approach. The only advantage it would have that it would be simpler to
vizualize what happened, but ideally users would not need to care about
the steps and filenames at all.

> Implementing this does not look hard even for a running domain but this is
> a big change to API and all mgmt need to be aware of (well it could be done
> safely using a new flag).

Yup, too much hassle, questionable outcome. Not worth it.

> 
> Anyway we can go on with current image names. In order to specify overlay
> image name let's introduce new API:
> 
>     int virDomainRevertToSnapshotXML(virDomainSnapshotPtr snapshot,
>                                      char *xmlDesc,
>                                      unsigned int flag

I was briefly considering that the new snapshot api should also have
provisions for specifying a snapshot XML to take the snapshot of the
abandoned state, but as discussed it's not really necessary if you can
do it in 2 steps.

> with XML like:
> 
>     <domainsnapshotrevert>
>       <disks>
>         <disk name='vda'>
>           <source file='/path/to/revert/overlay'/>
>         </disk>
>       </disks>
>     </domainsnapshotrevert>
> 
> Having an XML looks like a bit overkill right now but I could not
> find a better solution.

I don't see any other option. In case you have e.g. networked storage
where we can't figure out stuff by ourselves you basically don't have
other option.

> If overlay name is omitted the generated name will be like disk.snap1-1 in
> the
> above example to be in alignment with creating a snapshot case. So that
> disk.snap1*
> images hold states derived from snap1 state. We can also support reverting
> to
> external snapshot thru existing virDomainRevertToSnapshot for those who
> rely on
> generated names using this naming scheme.

Yes, this would be possible, but quite fragile. But with regular
snapshots without any magic it should work.

One thing you've missed though is that deletion of snapshots now becomes
quite a challenge.

                     s              s              s
                     n              n              n
                     a              a              a
                     p              p              p
                     1              2              3

o                    │              │              │
r    base.qcow2      │  snap1.qcow2 │  snap2.qcow2 │
i ─────────────────► │ ────────────►│ ────────────►│
g                    │              │              │
i                    │              │              │
n                    │
                     │                                    p
                     │                                    r
                     │  alternatehistory.qcow2            e
                     │ ────────────────────────────────►  s
                     │                                    e
                     │                                    n
                                                          t

Specifically the premise is that we don't want to keep unnecessary
images (e.g. the trivial solution for deleting 'snap1' would be to
remove the metadata and keep base.qcow2 intact) as without the metadata
it will become very hard to determine which images are actually still
used.

So in the above scenario:

- User wants to delete snap3:
  'snap2.qcow2' overlay becomes unreachable and should be deleted.

- User wants to delete snap2:
  There are 2 options in this case:
    1) block-pull snap1 into snap2
    2) block-commit snap2 into snap1

  The optimal solution is the one that transfers less data, but in the
  end either of them is equally good in terms of saved storage.

- User wants to delete snap1:
  This is a bit harder:
   1) commit into base.qcow2 would invalidate 'alternatehistory.qcow2'
   2) pull of base into both snap1.qcow2 and alternatehistory.qcow2 is
      possible, but may yield big images
   3) the last option is to forbid deletion of snapshots that has
      alternate histories

The first thing we should do though is to forbid creation of mixed trees
of internal and external snapshots because the matrix of stuff to handle
would explode way beyond this already rather complex situation.