Re: [PATCH v4 0/8] bulk snapshot list/redefine (incremental backup saga)

Daniel P. Berrangé <berrange@xxxxxxxxxx> · Wed, 13 Mar 2019 10:02:28 +0000

On Tue, Mar 12, 2019 at 04:52:24PM -0500, Eric Blake wrote:
> On 3/12/19 4:35 PM, Nir Soffer wrote:
> 
> >>> We don't have a need to list or define snapshots since we managed
> >> snapshots
> >>> on oVirt side.
> >>> We want an API to list and redefine checkpoints.
> >>
> >> But the proposed <domaincheckpoint> also has a <domain> subelement, so
> >> it has the same problem (the XML for a bulk operation can become
> >> prohibitively large).
> >>
> > 
> > Why do we need <domain> in a checkpoint?
> 
> The initial design for snapshots did not have <domain>, and it bit us
> hard; you cannot safely revert to a snapshot if you don't know the state
> of the domain at that time. The initial design for checkpoints has thus
> mandated that <domain> be present (my v5 series fails to redefine a
> snapshot if you omit <domain>, even though it has a NO_DOMAIN flag
> during dumpxml for reduced output).  If we are convinced that defining a
> snapshot without <domain> is safe enough, then I can relax the
> checkpoint code to allow redefined metadata without <domain> the way
> snapshots already have to do it, even though I was hoping that
> checkpoints could start life with fewer back-compats that snapshot has
> had to carry along.  But I'd rather start strict and relax later
> (require <domain> and then remove it when proven safe), and not start
> loose (leave <domain> optional, and then wish we had made it mandatory).

Given that a guest domain XML can be change at runtime at any point,
I don't see how omitting <domain> from the checkpoint XML is safe
in general. Even if apps think it is safe now and omit it, a future
version of the app might change in a way that makes omitting the
<domain> unsafe. If we didn't historically record the <domain> in
the checkpoint in the first place, then the new version of the app
is potentially in trouble. So I think it is good that we are strict
and mandate the <domain> XML even if it is not technically required
in some use cases.

> > Note that vdsm may be killed in the middle of the redefine loop, and in
> > this case
> > we leave livbirt with partial info about checkpoints, and we need to
> > redefine
> > the checkpoints again handling this partial sate.
> 
> But that's relatively easy - if you don't know whether libvirt might
> have partial data, then wipe the data and start the redefine loop from
> scratch.

Of course the same failure scenario applies if libvirt is doing it via
a bulk operation. The redefine loop still exists, just inside libvirt
instead, which might be killed or die part way though. So you're not
really fixing a failure scenario, just moving the failure to a different
piece. That's no net win.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list