On Tue, Mar 12, 2019 at 04:52:24PM -0500, Eric Blake wrote: > On 3/12/19 4:35 PM, Nir Soffer wrote: > > >>> We don't have a need to list or define snapshots since we managed > >> snapshots > >>> on oVirt side. > >>> We want an API to list and redefine checkpoints. > >> > >> But the proposed <domaincheckpoint> also has a <domain> subelement, so > >> it has the same problem (the XML for a bulk operation can become > >> prohibitively large). > >> > > > > Why do we need <domain> in a checkpoint? > > The initial design for snapshots did not have <domain>, and it bit us > hard; you cannot safely revert to a snapshot if you don't know the state > of the domain at that time. The initial design for checkpoints has thus > mandated that <domain> be present (my v5 series fails to redefine a > snapshot if you omit <domain>, even though it has a NO_DOMAIN flag > during dumpxml for reduced output). If we are convinced that defining a > snapshot without <domain> is safe enough, then I can relax the > checkpoint code to allow redefined metadata without <domain> the way > snapshots already have to do it, even though I was hoping that > checkpoints could start life with fewer back-compats that snapshot has > had to carry along. But I'd rather start strict and relax later > (require <domain> and then remove it when proven safe), and not start > loose (leave <domain> optional, and then wish we had made it mandatory). Given that a guest domain XML can be change at runtime at any point, I don't see how omitting <domain> from the checkpoint XML is safe in general. Even if apps think it is safe now and omit it, a future version of the app might change in a way that makes omitting the <domain> unsafe. If we didn't historically record the <domain> in the checkpoint in the first place, then the new version of the app is potentially in trouble. So I think it is good that we are strict and mandate the <domain> XML even if it is not technically required in some use cases. > > Note that vdsm may be killed in the middle of the redefine loop, and in > > this case > > we leave livbirt with partial info about checkpoints, and we need to > > redefine > > the checkpoints again handling this partial sate. > > But that's relatively easy - if you don't know whether libvirt might > have partial data, then wipe the data and start the redefine loop from > scratch. Of course the same failure scenario applies if libvirt is doing it via a bulk operation. The redefine loop still exists, just inside libvirt instead, which might be killed or die part way though. So you're not really fixing a failure scenario, just moving the failure to a different piece. That's no net win. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list