On Thu, Mar 11, 2021 at 3:24 PM Peter Krempa <pkrempa@xxxxxxxxxx> wrote: > > On Thu, Mar 11, 2021 at 10:51:13 +0200, Liran Rotenberg wrote: > > We recently had this bug[1]. The thought that came from it is the handling > > of error code after running virDomainSnapshotCreateXML, we encountered > > VIR_ERR_OPERATION_ABORTED(78). > > VIR_ERR_OPERATION_ABORTED is an error code which is emitted by the > migration code only. That means that the error comes from the failure to > take a memory image/snapshot of the VM. > > Quick skim through the bugreport seems to mention timeout, so your code > probably aborted the snapshot if it was taking too long. > > > Apparently, the new volume is in use. Are there cases where this will > > happen and the new volume won't appear in the volumes chain? Can we detect > > / know when? > > In the vast majority of cases if virDomainSnapshotCreateXML returns > failure the new disk volumes are NOT used at that point. > > Libvirt tries very hard to ensure that everything is atomic. The memory > snapshot is taken before installing volumes into the backing chain, so > if that one fails we don't even attempt to do anything with the disks. > > There are three extremely unlikely reasons where the snapshot API returns > failure and new images were already installed into the backing chain: > > 1) resuming of the VM failed after snapshot > 2) thawing (domfsthaw) of filesystems has failed > (easily avoided by not using the _QUIESCE flag, but freezing > manually) > 3) saving of the internal VM state XML failed > > Any error except those above can happen only if the images werent > installed or the VM died while installing the images. > > In addition if resuming the cpus after the snapshot fails, the cpus > didn't run so the guest couldn't have written anything to the image. > Since snapshot is supposed to flush qemu caches, in case you destroy the > VM without running the vcpus it's safe to discard the overlays as guest > didn't write anything into them yet. > > > Thinking aloud, if we can detect such cases we can prevent rolling back by > > reporting it back from VDSM to ovirt. Or, if it can't be detected to go on > > the safe side in order to save data corruption and prevent the rollback as > > well. > > In general, except for the case when saving of the guest XML has failed, > the new disk images will not be used by the VM so it's safe to delete > them. > > > Currently, in ovirt, if the job is aborted, we will look into the chain to > > decide whether to rollback or not. > > This is okay, we update the XML only if qemu successfully installed the > overlays. Thanks Peter!