On Thu, Oct 11, 2018 at 13:06:14 +0100, Dr. David Alan Gilbert wrote: > * Peter Krempa (pkrempa@xxxxxxxxxx) wrote: > > On Tue, Oct 09, 2018 at 19:34:43 +0200, Markus Armbruster wrote: > > > Cc: libvir-list for review of the compatibility argument below. > > > > > > Daniel Henrique Barboza <danielhb413@xxxxxxxxx> writes: > > > > > > > Hey David, > > > > > > > > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote: > > > >> * Daniel Henrique Barboza (danielhb413@xxxxxxxxx) wrote: > > > >>> changes in v2: > > > >>> - removed the "RFC" marker; > > > >>> - added a new patch (patch 2) that removes > > > >>> bdrv_snapshot_delete_by_id_or_name from the code; > > > >>> - made changes in patch 1 as suggested by Murilo; > > > >>> - previous patch set link: > > > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html > > > >>> > > > >>> > > > >>> It is not uncommon to see bugs being opened by testers that attempt to > > > >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite > > > >>> common snapshot names and they trigger a lot of bugs. I gave an example > > > >>> in the commit message of patch 1, but to sum up here: QEMU treats the > > > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It > > > >>> is documented as such, but this can lead to strange situations. > > > >>> > > > >>> Given that it is strange for an API to consider a parameter to be 2 fields > > > >>> at the same time, and inadvently treating them as one or the other, and > > > >>> that removing the ID field is too drastic, my idea here is to keep the > > > >>> ID field for internal control, but do not let the user set it. > > > >>> > > > >>> I guess there's room for discussion about considering this change an API > > > >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt, > > > >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm. > > > >> Can you clarify a couple of things: > > > >> a) What is it about libvirt's use that means it's OK ? Does it never > > > >> use numeric tags? > > > > > > > > I am glad you asked because my understanding in how Libvirt was dealing > > > > with snapshots was wrong, and I just looked into it further to answer your > > > > question. Luckily, this series fixes the situation for Libvirt as well. > > > > > > > > I was misled by the fact that Libvirt does not show the same symptoms > > > > we see in > > > > QEMU of this problem, but the bug is there. Here's a quick test with > > > > Libvirt with > > > > "0" and "1" as snapshot names, considering a VM named with no snapshots, > > > > using QEMU 2.12 and Libvirt 4.0.0: > > > > > > > > - create the "0" snapshot: > > > > > > > > $ sudo virsh snapshot-create-as --name 0 dhb > > > > Domain snapshot 0 created > > > > > > > > $ sudo virsh snapshot-list dhb > > > > Name Creation Time State > > > > ------------------------------------------------------------ > > > > 0 2018-09-24 15:47:56 -0400 running > > > > > > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > > > > List of snapshots present on all disks: > > > > ID TAG VM SIZE DATE VM CLOCK > > > > -- 0 405M 2018-09-24 15:47:56 00:04:20.867 > > > > > > > > > > > > - created the "1" snapshot. Here, Libvirt shows both snapshots with > > > > snapshot-list, > > > > but the snapshot was erased inside QEMU: > > > > > > > > $ sudo virsh snapshot-create-as --name 1 dhb > > > > Domain snapshot 1 created > > > > $ > > > > $ sudo virsh snapshot-list dhb > > > > Name Creation Time State > > > > ------------------------------------------------------------ > > > > 0 2018-09-24 15:47:56 -0400 running > > > > 1 2018-09-24 15:50:09 -0400 running > > > > > > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots > > > > List of snapshots present on all disks: > > > > ID TAG VM SIZE DATE VM CLOCK > > > > -- 1 404M 2018-09-24 15:50:10 00:05:36.226 > > > > > > > > > > > > This is where I stopped checking out Libvirt at first, concluding > > > > wrongly that it > > > > was immune to the bug. > > > > > > > > Libvirt does not throw an error when trying to apply snapshot 0. In > > > > fact, it acts > > > > like everything went fine: > > > > > > > > $ sudo virsh snapshot-revert dhb 0 > > > > > > > > $ echo $? > > > > 0 > > > > > > Is that a libvirt bug? > > > > Probably yes. The error handling from HMP sucks and can't really be > > fixed in all cases. This is for the handler which calls "loadvm": > > > > if (strstr(reply, "No block device supports snapshots") != NULL) { > > virReportError(VIR_ERR_OPERATION_INVALID, "%s", > > _("this domain does not have a device to load snapshots")); > > goto cleanup; > > } else if (strstr(reply, "Could not find snapshot") != NULL) { > > virReportError(VIR_ERR_OPERATION_INVALID, > > _("the snapshot '%s' does not exist, and was not loaded"), > > name); > > goto cleanup; > > } else if (strstr(reply, "Snapshots not supported on device") != NULL) { > > virReportError(VIR_ERR_OPERATION_INVALID, "%s", reply); > > goto cleanup; > > } else if (strstr(reply, "Could not open VM state file") != NULL) { > > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > > goto cleanup; > > } else if (strstr(reply, "Error") != NULL > > && strstr(reply, "while loading VM state") != NULL) { > > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > > goto cleanup; > > } else if (strstr(reply, "Error") != NULL > > && strstr(reply, "while activating snapshot on") != NULL) { > > virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply); > > goto cleanup; > > } > > > > If the above does not match the reported error, we report success. The > > same problem can happen with any of the 5 [1] HMP command handlers we > > still have. > > > > Note that a similar abomination is also for the code which calls > > "savevm". > > Would the following small qemu change make life a little safer: > > diff --git a/hmp.c b/hmp.c > index 576253a01f..0729a8c7ed 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -62,7 +62,7 @@ static void hmp_handle_error(Monitor *mon, Error **errp) > { > assert(errp); > if (*errp) { > - error_report_err(*errp); > + error_reportf_err(*errp, "Error: "); > } > } > > changing: > > No block device supports snapshots > > > to: > > Error: No block device supports snapshots > > so you could check for Error: at the start and know that you've hit some > unknown error? That certainly would help, provided that the message is not localized in any way. I'd be very glad to see it also for 'savevm' and 'delvm'. It's a shame that drive_add and drive_del don't have that either, but they will soon become unused.
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list