On Mon, Sep 13, 2021 at 17:52:49 +0300, Nikolay Shirokovskiy wrote: > ср, 8 сент. 2021 г. в 12:27, Peter Krempa <pkrempa@xxxxxxxxxx>: > > > On Tue, Aug 31, 2021 at 16:46:25 +0300, Nikolay Shirokovskiy wrote: [...] > > A rather big set of problems which we will necessarily encounter when > > implementing this comes from the following: > > > > 1) users can modify the VM betwen snapshots or between the last snapshot > > and the abandoned state and this modification (e.g. changing a disk > > image) must be considered to prevent data loss when manipulating the > > images. > > > > Hi, Peter! > > Can you please explain in more detail what the issue is? For example if > after snap1 I changed disk image to some base2.qcow2 then snap1.qcow2 is > expected to be managed by mgmt (expected to be deleted I guess). Well, if the management deletes them (at least 'base.qcow2'), it's a problem because it can no longer be reverted to 'snap1'. As you've correctly pointed out 'snap1.qcow2' is unreachable and thus should have been deleted. >Then I made > a snap2. Now I have base.qcow2 (keeping snap1 state) and base2.qcow2 > (keeping snap2 state) <-snap2.qcow2 > chain. Thanks to metadata I can revert to both snap1 and snap2 as I know > the domain > xml in those states and these xmls references right images (snap1 > references base.qcow2 > and snap2 references base2.qcow2). Sure but programatically checking all the possibilities (e.g. in this scenario: s s n n a a p p 1 2 vda.qcow2 │ vda.snap1 │ vda.snap2 o ────────────► │ ──────────────────────► │ ────────── p r vdb.qcow2 │ vdb.snap2 r i ──────────────────────────────────────► │ ────────── e g vdc.qcow2 vdc.snap1 vdc.other │ vdc.snap2 s i ────────────► │ ─────────X ──────────► │ ────────── e n vdd.qcow2 │ vdc.snap1 vdc.snap1 │ vdd.snap2 n ────────────► │ ─────────X ──────────► │ ────────── t will become a bit of a nightmare. E.g. checking that 'vdc' was replaced by a different image if the names don't match might be feasible, but if a user just rewrites the image it's actually impossible for us to detect. This will require a bit more of thinking how to deal with certain situations and we might even need to be able to introduce a flag for reverting from an unsafe scenario such as: s n a p 1 vda.qcow2 │ vda.snap1 ────────────► │ ────────────────────── vdc.qcow2 vdc.snap1 vdc.other ────────────► │ ─────────X ────────── Even here we need to be careful, because reverting back to snap1 will possibly leak 'vdc.other'. To manage user expectations properly we might want to add a flag which will e.g. disallow reversion if files which arent recorded in 'snap1' are going to be left behind but we must never delete 'vdc.other' to prevent data loss. > > 2) libvirt doesn't record the XML used to create the snapshot (e.g. > > > > It is not quite so. According to > https://libvirt.org/formatsnapshot.html#example > xml given to create API saved in snapshot metadata. Or I miss something. No, you are right. I forgot that we actually do, which is good. This will provide the possibility to do a lot of checks to prevent breaking user setup. > > snap1.xm) thus we don't actually know what the (historical) snapshot was > > actually recording. Coupled with the fact that the user could have > > changed the VM definition between 'snap1' and 'snap2' we can't even > > infer what the state was. > > > > Here too I cannot follow what the issue is. Can you please give an example? This point is no longer applicable. We do record the snapshot XML too so this allows us to do checking whether the current configuration of the VM is consistent with the state when the snapshot was created. Basically the hardest part of this is to ensure that we don't remove or corrupt any image with user data which still might be needed. For simple VM definitions this is very simple but since we give the users quite a lot of freedom what to do (including replacing disks, taking partial snapshots, etc.) we then need to be prepared for a comparatively large number of scenarios we'll need to deal with. Unfortunately we're risking data loss here, so we need to approach it very carefully especially for scenarios where individual users use e.g. virsh to do management where they might not have tested whether everything works on throwaway VMs. [...] > > One thing you've missed though is that deletion of snapshots now becomes > > quite a challenge. > > > > Yeah I did not consider implementing deleting at that moment. However to > make > external snapshots usable it should be implemented as well. Yeah, both need to be implemented at the same time. > Anyway is anybody in Red Hat working on this (considering you and Pavel > discussed > the topic recently)? If not I would like to proceed with implementation. We definitely plan to work on it, but I can't give you any time estimates yet. More importantly, since you are interested into this as well, it would be great if you could elaborate on how you want to use this if it's ready especially any special scenarios. For us the basic goal is to achieve feature parity between internal and external snapshots so that they can be used interchangably and eventually prefering external snapshots as the way forward.