Re: [libvirt PATCH 2/2] conf: log error on attempts to modify ACPI index of active device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/17/21 3:22 AM, Peter Krempa wrote:
On Thu, Sep 16, 2021 at 21:16:03 -0400, Laine Stump wrote:
On 9/13/21 11:38 AM, Peter Krempa wrote:
On Fri, Sep 10, 2021 at 11:18:20 -0400, Laine Stump wrote:
On 9/10/21 3:30 AM, Peter Krempa wrote:
On Thu, Sep 09, 2021 at 13:46:41 -0400, Laine Stump wrote:

Look, my problem is that we have an utterly inconsistent user story in
regards to device update and we are repeating our own mistakes.

While I also think that consistency is paramount in making an API understandable, predictable, and thus usable, I think that description is a bit hyperbolic...[*]

If we are stating that the user should use the FULL xml for update, then
they should use the full XML and we should not arbitrarily choose what
we allow.

Similarly if we want to allow partial XML then that's fine as long as we
are consistent with ourselves, the user expectations and the docs.

While the man page for virsh update-device says nothing on this subject, the documentation for virDomainUpateDeviceFlags() does say this:

  The supplied XML description of the device should
  contain all the information that is found in the
  corresponding domain XML. Leaving out any piece of
  information may be treated as a request for its
  removal, which may be denied. For instance, when
  users want to change CDROM media only for live
  XML, they must provide live disk XML as found in
  the corresponding live domain XML with only the
  disk path changed.

So the official documentation says (in a clause that was ironically added in response to the same Bug report (1621910) that added the "unspecified exception" for alias!!) that omitting an attribute may (*may*! - what a "weasel word" :-P) be treated as a request for its removal.

As far as requiring the user to read the current live XML and use that as the starting point for a device-update - while I think that is a noble goal, many (well, at least 2!) management applications don't ever read the domain XML back from libvirt - RHV at least didn't in the past when that bug was filed (don't know if it's still the case), and Kubevirt/CNV simply *can't* (as far as I understand it); all domain config is a one way street for them (or at least that's what was explained to us a couple years ago). So requiring update-device to be given a modified "current live XML" of a device is just going to be impossible.


Now for the repeating of our mistake part:

1) originally the alias was completely ignored in device-update, since it
was always set automatically by libvirt, and so it was a read-only attribute
from the application's PoV.

2) Then user-specified aliases were added, with no associated change to the
update-device code, and QE testing found that a requested change to alias
during live device update was silently ignored:

    https://bugzilla.redhat.com/1585108

Same for the ACPI index. It was added, and device-update was neglected.

Yes. This has happened several times. Just as it used to happen with, e.g. new attributes being added and not documented (or not properly validated during parsing), or new devices being added and hotplug support forgotten. Those others have been internalized by everyone now, so invariably they'll be caught in review. Hopefully "remember to add the check in update-device" will also be internalized and caught more often in review (some things actually *can* be changed at runtime, and so remembering this might add useful functionality, not just a hand-slap for trying :-))



3) This led to a patch that checked for any change in alias in the
update-device XML, with no special case for "not specified" (i.e. basically
what you're suggesting the code should be):


https://gitlab.com/libvirt/libvirt/-/commit/4ad54a417a1bbe10aeffcce783f4381d8d43f799

Yup, that's also exactly what this patch is doing.

Yes, as the documentation says we should.


4) That patch resulted in a bug filed by RHV saying that the behavior of the
update-device API had changed, which broke their existing code for changing
the media in a CD drive or something like that (they don't read back the
normalized XML after they've started the guest, so they don't have any info
about auto-generated values, and thus weren't including alias in the XML
they sent to update-device (this was before RHV started specifying their own
aliases):

    https://bugzilla.redhat.com/1621910

So this is still going to be a hypothetical.

Imagine a management application that uses partial XML when updating a
device because they generate it rather than taking a snippet from
dumpxml and modifying it. (We do have an example at least in terms of
disk update here.)

The same way as in libvirt, they figure out that they need to start
using the ACPI index for some reason. They implement it in their full
domain XML generator, but similarly to libvirt where we neglect the
update-device code they don't modify the generator that generates the
update XML.

So now they are using the ACPI index when defining the XML but not when
updating the device. The update-device code in libvirt thinks now that
they are attempting to change the ACPI index, which would seem to be a
regression, because nobody actually touched the update device XML
generator. This is also amplified by the fact libvirt actually fixed a
similar bug before. Bam a new BZ gets filed for this.

... and we reject it, saying "*You* updated part of your application to support setting ACPI index, and *you* didn't update all of it, so *you* have a bug that *you* need to fix in *your* application."


I grant you a mitigating circumstance that ACPI index is not really a
commonly used feature, so there's a chance that the scenario above might
not happen, but that doesn't mean that it's not a latent problem given
what we've done before.

acpi index is a totally different (to be hyperbolic!) situation from alias (which, again, was 1) originally not settable by the user, and 2) automatically generated by libvirt, while acpi index has, for as long as it has existed, 1) been settable (only) by the user, and 2) is never automatically generated by libvirt.




5) *That* bug resulted in the code being changed to what it is today, with
this patch:


https://gitlab.com/libvirt/libvirt/-/commit/b48d9e939bcf32a8d6e571313637e2eefe52e117

So essentially we are trapped into this "ignore unspecified alias" semantic
because alias previously existed but wasn't user-specified, and suddenly
requiring it broke existing applications trying to do things completely
unrelated to aliases.

See, we've acknowledged actually that partial XMLs are fine and thus
there was my reasoning that the same logic as with the alias should be
applied, otherwise it's opening a gate for another history repeat.


I disagree. In the case of alias, the bug was reported when an existing piece of functionality in the management application (RHV) was broken *with no change to RHV*, simply by updating libvirt. ("I didn't change *anything* in my application, just updated libvirt and now my application (which doesn't know or care about the new libvirt feature) doesn't work!")

In your hypothetical situation, the management application consciously added support for acpi index, but only did it halfway. As long as they ignored the existence of acpi index in the rest of their application, they would never see a problem when using update-device; this is because libvirt will never set acpi index, the application itself must set it. (the same can be said for virtio options, which have no exception for "not specified"); if they're going to go to the trouble of setting it in the initial config, then they know what it's set to, and can/should set it in the XML sent to device-update. ("I updated my application to support this new libvirt feature, and now my application doesn't work!")

([*]P.S. On the other hand, there are a few rom-related attributes that are 1) only validated when updating an <interface>, but not when changing any other type of PCI device (of course we only support changing of interface, graphics, and disk), and 2) *do* have the "not specified" exception as alias does (added in response to https://bugzilla.redhat.com/1599513 - found as a result of my "morning spelunking") in spite of the fact that those options are *never* auto-generated by libvirt, so maybe your "bole" isn't so "hyper" after all :-P.)

(My opinion is that attributes that are auto-generated/auto-filled by libvirt when they aren't specified should always get the exception, while items that remain blank when unspecified should not. This will assure that a) existing applications won't be broken by a libvirt update while b) applications *halfway* updated to support a new feature will break. This can be easily described and is consistent.)




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux