* Peter Krempa (pkrempa@xxxxxxxxxx) wrote: > On Tue, Oct 09, 2018 at 19:34:43 +0200, Markus Armbruster wrote: > > Cc: libvir-list for review of the compatibility argument below. > > > > Daniel Henrique Barboza <danielhb413@xxxxxxxxx> writes: > > > > > Hey David, > > > > > > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote: > > >> * Daniel Henrique Barboza (danielhb413@xxxxxxxxx) wrote: > > >>> changes in v2: > > >>> - removed the "RFC" marker; > > >>> - added a new patch (patch 2) that removes > > >>> bdrv_snapshot_delete_by_id_or_name from the code; > > >>> - made changes in patch 1 as suggested by Murilo; > > >>> - previous patch set link: > > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > > >>> > > >>> > > >>> It is not uncommon to see bugs being opened by testers that attempt to > > >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite > > >>> common snapshot names and they trigger a lot of bugs. I gave an example > > >>> in the commit message of patch 1, but to sum up here: QEMU treats the > > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > > >>> is documented as such, but this can lead to strange situations. > > >>> > > >>> Given that it is strange for an API to consider a parameter to be 2 fields > > >>> at the same time, and inadvently treating them as one or the other, and > > >>> that removing the ID field is too drastic, my idea here is to keep the > > >>> ID field for internal control, but do not let the user set it. > > >>> > > >>> I guess there's room for discussion about considering this change an API > > >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > > >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > > >> Can you clarify a couple of things: > > >> a) What is it about libvirt's use that means it's OK ? Does it never > > >> use numeric tags? > > > > > > I am glad you asked because my understanding in how Libvirt was dealing > > > with snapshots was wrong, and I just looked into it further to answer your > > > question. Luckily, this series fixes the situation for Libvirt as well. > > > > > > I was misled by the fact that Libvirt does not show the same symptoms > > > we see in > > > QEMU of this problem, but the bug is there. Here's a quick test with > > > Libvirt with > > > "0" and "1" as snapshot names, considering a VM named with no snapshots, > > > using QEMU 2.12 and Libvirt 4.0.0: > > > > > > - create the "0" snapshot: > > > > > > $ sudo virsh snapshot-create-as --name 0 dhb > > > Domain snapshot 0 created > > > > > > $ sudo virsh snapshot-list dhb > > > Name Creation Time State > > > ------------------------------------------------------------ > > > 0 2018-09-24 15:47:56 -0400 running > > > > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > > > List of snapshots present on all disks: > > > ID TAG VM SIZE DATE VM CLOCK > > > -- 0 405M 2018-09-24 15:47:56 00:04:20.867 > > > > > > > > > - created the "1" snapshot. Here, Libvirt shows both snapshots with > > > snapshot-list, > > > but the snapshot was erased inside QEMU: > > > > > > $ sudo virsh snapshot-create-as --name 1 dhb > > > Domain snapshot 1 created > > > $ > > > $ sudo virsh snapshot-list dhb > > > Name Creation Time State > > > ------------------------------------------------------------ > > > 0 2018-09-24 15:47:56 -0400 running > > > 1 2018-09-24 15:50:09 -0400 running > > > > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > > > List of snapshots present on all disks: > > > ID TAG VM SIZE DATE VM CLOCK > > > -- 1 404M 2018-09-24 15:50:10 00:05:36.226 > > > > > > > > > This is where I stopped checking out Libvirt at first, concluding > > > wrongly that it > > > was immune to the bug. > > > > > > Libvirt does not throw an error when trying to apply snapshot 0. In > > > fact, it acts > > > like everything went fine: > > > > > > $ sudo virsh snapshot-revert dhb 0 > > > > > > $ echo $? > > > 0 > > > > Is that a libvirt bug? > > Probably yes. The error handling from HMP sucks and can't really be > fixed in all cases. This is for the handler which calls "loadvm": > > if (strstr(reply, "No block device supports snapshots") != NULL) { > virReportError(VIR_ERR_OPERATION_INVALID, "%s", > _("this domain does not have a device to load snapshots")); > goto cleanup; > } else if (strstr(reply, "Could not find snapshot") != NULL) { > virReportError(VIR_ERR_OPERATION_INVALID, > _("the snapshot '%s' does not exist, and was not loaded"), > name); > goto cleanup; > } else if (strstr(reply, "Snapshots not supported on device") != NULL) { > virReportError(VIR_ERR_OPERATION_INVALID, "%s", reply); > goto cleanup; > } else if (strstr(reply, "Could not open VM state file") != NULL) { > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > goto cleanup; > } else if (strstr(reply, "Error") != NULL > && strstr(reply, "while loading VM state") != NULL) { > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > goto cleanup; > } else if (strstr(reply, "Error") != NULL > && strstr(reply, "while activating snapshot on") != NULL) { > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > goto cleanup; > } > > If the above does not match the reported error, we report success. The > same problem can happen with any of the 5 [1] HMP command handlers we > still have. > > Note that a similar abomination is also for the code which calls > "savevm". Would the following small qemu change make life a little safer: diff --git a/hmp.c b/hmp.c index 576253a01f..0729a8c7ed 100644 --- a/hmp.c +++ b/hmp.c @@ -62,7 +62,7 @@ static void hmp_handle_error(Monitor *mon, Error **errp) { assert(errp); if (*errp) { - error_report_err(*errp); + error_reportf_err(*errp, "Error: "); } } changing: No block device supports snapshots to: Error: No block device supports snapshots so you could check for Error: at the start and know that you've hit some unknown error? Dave > [1] We technically have 6 HMP handlers, but "cpu_set" is not used if yoy > have a QEMU younger than 3 years. Soon also "drive_add" and "drive_del" > will be replaced, so we'll be stuck with the 3 internal snapshot ones. > > > > > Reverting back to snapshot "1" works as intended, restoring the VM > > > state when it > > > was created. > > > > > > > > > (perhaps this is something we should let Libvirt be aware of ...) > > The error message from qemu which was wrongly ignored by qemu can be > found in the libvirtd debug log. It would be helpful if you could > provide it. > > > > > > > > > > > > > This series fixes this behavior because the snapshot 0 isn't erased > > > from QEMU, allowing > > > Libvirt to successfully restore it. > > > > > > > > >> b) After this series are you always guaranteed to be able to fix > > >> any existing oddly named snapshots? > > > > > > The oddly named snapshots that already exists can be affected by the > > > semantic > > > change in loadvm and delvm, in a way that the user can't load/del > > > using the snap > > > ID, just the tag. Aside from that, I don't see any side effects with > > > existing > > > snapshots and this patch series. > > > > Do all snapshots have a tag that is unique within their image? Even > > snapshots created by old versions of QEMU? > > I remember there was a discussion which regarded problems with > collisions between the name and the ID of the snapshot when dealing with > loadvm/delvm commands but I can't seem to find it currently. Note that > libvirt plainly issues loadvm/delvm $SNAPSHOTNAME. If the name is > numeric it might clash. Similarly for inactive vms via qemu-img. -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list