Re: RFC: revert to external snapshot API

Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> · Mon, 13 Sep 2021 17:52:49 +0300

ср, 8 сент. 2021 г. в 12:27, Peter Krempa <pkrempa@xxxxxxxxxx>:
On Tue, Aug 31, 2021 at 16:46:25 +0300, Nikolay Shirokovskiy wrote:

> Hi, all.

Hi, sorry for the late reply I was on PTO.

> I want to implement reverting to external snapshot functionality.

> I've checked the mailing list history and found some previous attempts

> (not sure if this is a complete list of them).

> 

> [1] was done in 2012 by the Redhat team itself. [2] and [3] was done in

> 2018.

> Looks like it was not clear how the API should look like.

One additional thing is that me and phrdina started discussing this (in

person so I can't point you to a discussion) 2 weeks ago.

I'll summarize the points we agreed upon.

> For example we have disk.qcow2 <- disk.snap1 chain <- disk.snap2 chain

> (after

> having snap1 and snap2 snapshots). Now we want to revert to snap1 snapshot.

There's one implementation snag we currently have which complicates

stuff. Let's expand your above scenario with the snapshot states:

                           s                      s

                           n                      n

                           a                      a

                           p                      p

                           1                      2

                                                                          p

o                          │                      │                       r

r    base.qcow2            │    snap1.qcow2       │    snap2.qcow2        e

i ───────────────────────► │ ────────────────────►│ ────────────────────► s

g                          │                      │                       e

i                          │                      │                       n

n                                                                         t

A rather big set of problems which we will necessarily encounter when

implementing this comes from the following:

1) users can modify the VM betwen snapshots or between the last snapshot

and the abandoned state and this modification (e.g. changing a disk

image) must be considered to prevent data loss when manipulating the

images.

Hi, Peter!

Can you please explain in more detail what the issue is? For example if
after snap1 I changed disk image to some base2.qcow2 then snap1.qcow2 is 
expected to be managed by mgmt (expected to be deleted I guess). Then I made
a snap2. Now I have base.qcow2 (keeping snap1 state) and  base2.qcow2 (keeping snap2 state) <-snap2.qcow2
chain. Thanks to metadata I can revert to both snap1 and snap2 as I know the domain
xml in those states and these xmls references right images (snap1 references base.qcow2
and snap2 references base2.qcow2).

2) libvirt doesn't record the XML used to create the snapshot (e.g.

It is not quite so. According to https://libvirt.org/formatsnapshot.html#example
xml given to create API saved in snapshot metadata. Or I miss something.

snap1.xm) thus we don't actually know what the (historical) snapshot was

actually recording. Coupled with the fact that the user could have

changed the VM definition between 'snap1' and 'snap2' we can't even

infer what the state was.

Here too I cannot follow what the issue is. Can you please give an example?

> The

> snapshot state is held by disk.qcow2 image. We can run reverted domain on

> disk.qcow2 itself but then we lose snap1 (held by disk.qcow2) and snap2

> (held by disk.snap1). So we definitely should run on some overlay over

> disk.qcow2. But then what name should be given to overlay? We should have

> an option for mgmt to specify this name like in case of snapshots itself.

Exactly. Reversion of external snapshots will necessarily require a new

API, which will take a new "snapshot" XML describing the new overlays as

you describe below.

In the simple case such as with local files we can use the same

algorithm for creating overlay filenames as we do when creating

snapshots but generally we need to give the MGMT the ability to specify

the overlay name.

> The [1] has some discussion on adding extra options to reverting

> functionality.

> Like for example if we revert back to snap2 then having the ability to run

> from

> state in disk.snap2 rather than disk.snap1. My opinion is there is no need

> to

> as if one wants to revert to the state of disk2.snap2 it can take a

> snapshot (some

> snap3).

It's possible to avoid doing a combined "take snapshot&revert" operation

as long as we have the possibility to take a snapshot and destroy the

VM as running it after the point you want to return to is undesirable

and pointless.

One thing that you need to consider here is that when you are reverting

to an arbitrary snapshot, the overlay files after the last snapshot

(e.g. snap2.qcow2 (in my diagram), which is the state between snap2 and

present) are abandoned and will never be used again.

At this point we disucssed that we should be removing those as

semantically it's the only point where we can do that as we know the

state of the VM which will be abandoned.

The problem lies in the fact that between 'snap2' and present the user

could have exchanged disks and then we don't have enough metadata to

create the overlays.

One big additional caveat is that if the user exchanged disk images we

_must not_ delete the changed image, so the metadata wrangling might be

non-trivial in this case.

> At the same time one needs to be aware that revert operation loses

> current state and later one can revert only to the states of snapshots.

> This is the way internal snapshots work and the way one expects external

> snapshots to work too.

That is an acceptable caveat for the user. As noted above as long as you

can take a snapshot & atomicaly destroy the VM it's acceptable.

For libvirt it's harder a bit as described above especially if we don't

want to keep litering the disk with unused and invalid images.

> The [2] takes an approach of reusing current top image as overlay on revert

> so

> that in the above example disk.snap2 will be overlay over disk.qcow2 on

> reverting to snap1 snapshot. IMHO this is a confusing naming scheme.

No in our discussion with Pavel  we've ruled out all special cases as

too much hassle to implement with little benefit.

If a user wants to rever&abandon some state, they can do 2 operations of

reverting and then deleting the unwanted snapshots.

> > The [3] suggests a different scheme for naming images. For example after

> taking

> snapshot snap1 the chain should be disk.snap1 <- disk.qcow2 which looks very

> appealing to me. With this naming using the [2] approach is quite natural.

IMO we can't really change the naming any more, users baked scripts on

top of this. Using a flag is also mostly pointless as users will keep

creating snapshots without it. Thus I don't think this is a viable

approach. The only advantage it would have that it would be simpler to

vizualize what happened, but ideally users would not need to care about

the steps and filenames at all.

> Implementing this does not look hard even for a running domain but this is

> a big change to API and all mgmt need to be aware of (well it could be done

> safely using a new flag).

Yup, too much hassle, questionable outcome. Not worth it.

> 

> Anyway we can go on with current image names. In order to specify overlay

> image name let's introduce new API:

> 

>     int virDomainRevertToSnapshotXML(virDomainSnapshotPtr snapshot,

>                                      char *xmlDesc,

>                                      unsigned int flag

I was briefly considering that the new snapshot api should also have

provisions for specifying a snapshot XML to take the snapshot of the

abandoned state, but as discussed it's not really necessary if you can

do it in 2 steps.

> with XML like:

> 

>     <domainsnapshotrevert>

>       <disks>

>         <disk name='vda'>

>           <source file='/path/to/revert/overlay'/>

>         </disk>

>       </disks>

>     </domainsnapshotrevert>

> 

> Having an XML looks like a bit overkill right now but I could not

> find a better solution.

I don't see any other option. In case you have e.g. networked storage

where we can't figure out stuff by ourselves you basically don't have

other option.

> If overlay name is omitted the generated name will be like disk.snap1-1 in

> the

> above example to be in alignment with creating a snapshot case. So that

> disk.snap1*

> images hold states derived from snap1 state. We can also support reverting

> to

> external snapshot thru existing virDomainRevertToSnapshot for those who

> rely on

> generated names using this naming scheme.

Yes, this would be possible, but quite fragile. But with regular

snapshots without any magic it should work.

One thing you've missed though is that deletion of snapshots now becomes

quite a challenge.

Yeah I did not consider implementing deleting at that moment. However to make
external snapshots usable it should be implemented as well.

Anyway is anybody in Red Hat working on this (considering you and Pavel discussed
the topic recently)? If not I would like to proceed with implementation.

Nikolay

                     s              s              s

                     n              n              n

                     a              a              a

                     p              p              p

                     1              2              3

o                    │              │              │

r    base.qcow2      │  snap1.qcow2 │  snap2.qcow2 │

i ─────────────────► │ ────────────►│ ────────────►│

g                    │              │              │

i                    │              │              │

n                    │

                     │                                    p

                     │                                    r

                     │  alternatehistory.qcow2            e

                     │ ────────────────────────────────►  s

                     │                                    e

                     │                                    n

                                                          t

Specifically the premise is that we don't want to keep unnecessary

images (e.g. the trivial solution for deleting 'snap1' would be to

remove the metadata and keep base.qcow2 intact) as without the metadata

it will become very hard to determine which images are actually still

used.

So in the above scenario:

- User wants to delete snap3:

  'snap2.qcow2' overlay becomes unreachable and should be deleted.

- User wants to delete snap2:

  There are 2 options in this case:

    1) block-pull snap1 into snap2

    2) block-commit snap2 into snap1

  The optimal solution is the one that transfers less data, but in the

  end either of them is equally good in terms of saved storage.

- User wants to delete snap1:

  This is a bit harder:

   1) commit into base.qcow2 would invalidate 'alternatehistory.qcow2'

   2) pull of base into both snap1.qcow2 and alternatehistory.qcow2 is

      possible, but may yield big images

   3) the last option is to forbid deletion of snapshots that has

      alternate histories

The first thing we should do though is to forbid creation of mixed trees

of internal and external snapshots because the matrix of stuff to handle

would explode way beyond this already rather complex situation.