Re: [Qemu-devel] [PATCH v2 0/3] HMP/snapshot changes - do not use ID anymore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 11, 2018 at 13:06:14 +0100, Dr. David Alan Gilbert wrote:
> * Peter Krempa (pkrempa@xxxxxxxxxx) wrote:
> > On Tue, Oct 09, 2018 at 19:34:43 +0200, Markus Armbruster wrote:
> > > Cc: libvir-list for review of the compatibility argument below.
> > > 
> > > Daniel Henrique Barboza <danielhb413@xxxxxxxxx> writes:
> > > 
> > > > Hey David,
> > > >
> > > > On 9/21/18 9:29 AM, Dr. David Alan Gilbert wrote:
> > > >> * Daniel Henrique Barboza (danielhb413@xxxxxxxxx) wrote:
> > > >>> changes in v2:
> > > >>> - removed the "RFC" marker;
> > > >>> - added a new patch (patch 2) that removes
> > > >>> bdrv_snapshot_delete_by_id_or_name from the code;
> > > >>> - made changes in patch 1 as suggested by Murilo;
> > > >>> - previous patch set link:
> > > >>> https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg04658.html
> > > >>>
> > > >>>
> > > >>> It is not uncommon to see bugs being opened by testers that attempt to
> > > >>> create VM snapshots using HMP. It turns out that "0" and "1" are quite
> > > >>> common snapshot names and they trigger a lot of bugs. I gave an example
> > > >>> in the commit message of patch 1, but to sum up here: QEMU treats the
> > > >>> input of savevm/loadvm/delvm sometimes as 'ID', sometimes as 'name'. It
> > > >>> is documented as such, but this can lead to strange situations.
> > > >>>
> > > >>> Given that it is strange for an API to consider a parameter to be 2 fields
> > > >>> at the same time, and inadvently treating them as one or the other, and
> > > >>> that removing the ID field is too drastic, my idea here is to keep the
> > > >>> ID field for internal control, but do not let the user set it.
> > > >>>
> > > >>> I guess there's room for discussion about considering this change an API
> > > >>> change or not. It doesn't affect users of HMP and it doesn't affect Libvirt,
> > > >>> but simplifying the meaning of the parameters of savevm/loadvm/delvm.
> > > >> Can you clarify a couple of things:
> > > >>    a) What is it about libvirt's use that means it's OK ? Does it never
> > > >> use numeric tags?
> > > >
> > > > I am glad you asked because my understanding in how Libvirt was dealing
> > > > with snapshots was wrong, and I just looked into it further to answer your
> > > > question. Luckily, this series fixes the situation for Libvirt as well.
> > > >
> > > > I was misled by the fact that Libvirt does not show the same symptoms
> > > > we see in
> > > > QEMU of this problem, but the bug is there. Here's a quick test with
> > > > Libvirt with
> > > > "0" and "1" as snapshot names, considering a VM named with no snapshots,
> > > > using QEMU 2.12 and Libvirt 4.0.0:
> > > >
> > > > - create the "0" snapshot:
> > > >
> > > > $ sudo virsh snapshot-create-as --name 0 dhb
> > > > Domain snapshot 0 created
> > > >
> > > > $ sudo virsh snapshot-list dhb
> > > > Name Creation Time State
> > > > ------------------------------------------------------------
> > > > 0 2018-09-24 15:47:56 -0400 running
> > > >
> > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> > > > List of snapshots present on all disks:
> > > > ID TAG VM SIZE DATE VM CLOCK
> > > > --    0  405M 2018-09-24 15:47:56 00:04:20.867
> > > >
> > > >
> > > > - created the "1" snapshot. Here, Libvirt shows both snapshots with
> > > > snapshot-list,
> > > > but the snapshot was erased inside QEMU:
> > > >
> > > > $ sudo virsh snapshot-create-as --name 1 dhb
> > > > Domain snapshot 1 created
> > > > $
> > > > $ sudo virsh snapshot-list dhb
> > > > Name Creation Time State
> > > > ------------------------------------------------------------
> > > > 0 2018-09-24 15:47:56 -0400 running
> > > > 1 2018-09-24 15:50:09 -0400 running
> > > >
> > > > $ sudo virsh qemu-monitor-command dhb --hmp info snapshots
> > > > List of snapshots present on all disks:
> > > > ID TAG VM SIZE DATE VM CLOCK
> > > > --    1  404M 2018-09-24 15:50:10 00:05:36.226
> > > >
> > > >
> > > > This is where I stopped checking out Libvirt at first, concluding
> > > > wrongly that it
> > > > was immune to the bug.
> > > >
> > > > Libvirt does not throw an error when trying to apply snapshot 0. In
> > > > fact, it acts
> > > > like everything went fine:
> > > >
> > > > $ sudo virsh snapshot-revert dhb 0
> > > >
> > > > $ echo $?
> > > > 0
> > > 
> > > Is that a libvirt bug?
> > 
> > Probably yes. The error handling from HMP sucks and can't really be
> > fixed in all cases. This is for the handler which calls "loadvm":
> > 
> >     if (strstr(reply, "No block device supports snapshots") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_INVALID, "%s",
> >                        _("this domain does not have a device to load snapshots"));
> >         goto cleanup;
> >     } else if (strstr(reply, "Could not find snapshot") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_INVALID,
> >                        _("the snapshot '%s' does not exist, and was not loaded"),
> >                        name);
> >         goto cleanup;
> >     } else if (strstr(reply, "Snapshots not supported on device") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_INVALID, "%s", reply);
> >         goto cleanup;
> >     } else if (strstr(reply, "Could not open VM state file") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
> >         goto cleanup;
> >     } else if (strstr(reply, "Error") != NULL
> >              && strstr(reply, "while loading VM state") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
> >         goto cleanup;
> >     } else if (strstr(reply, "Error") != NULL
> >              && strstr(reply, "while activating snapshot on") != NULL) {
> >         virReportError(VIR_ERR_OPERATION_FAILED, "%s", reply);
> >         goto cleanup;
> >     }
> > 
> > If the above does not match the reported error, we report success. The
> > same problem can happen with any of the 5 [1] HMP command handlers we
> > still have.
> > 
> > Note that a similar abomination is also for the code which calls
> > "savevm".
> 
> Would the following small qemu change make life a little safer:
> 
> diff --git a/hmp.c b/hmp.c
> index 576253a01f..0729a8c7ed 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -62,7 +62,7 @@ static void hmp_handle_error(Monitor *mon, Error **errp)
>  {
>      assert(errp);
>      if (*errp) {
> -        error_report_err(*errp);
> +        error_reportf_err(*errp, "Error: ");
>      }
>  }
> 
> changing:
> 
> No block device supports snapshots
> 
> 
> to:
> 
> Error: No block device supports snapshots
> 
> so you could check for Error: at the start and know that you've hit some
> unknown error?

That certainly would help, provided that the message is not localized in
any way. I'd be very glad to see it also for 'savevm' and 'delvm'.

It's a shame that drive_add and drive_del don't have that either, but
they will soon become unused.

Attachment: signature.asc
Description: PGP signature

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list

[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux