On Thu, Jun 23, 2016 at 12:34:35PM -0400, Laine Stump wrote:
On 06/23/2016 08:26 AM, Ján Tomko wrote:On Wed, Jun 22, 2016 at 03:00:47PM -0400, Laine Stump wrote:An attempt to attach a new scsi controller with both --live and --config but without specifying an index, e.g.: <controller model="virtio-scsi" type="scsi" /> led to this error: internal error: Cannot parse controller index -1 This was because unspecified indexes are auto-assigned during virDomainDefPostParse(), which doesn't happen for hotplugged devices until after the device has been added to the domainDef, but qemuDomainAttachFlags() makes a copy of the device (for feeding to qemuDomainAttachDeviceLive() *before* it's added to the config, and the copying function actually formats the device object and then re-parses it into a new object. Since qemuDomainAttachDeviceConfig() consumes the device object pointer (i.e. sets it to NULL in the original virDomainDeviceDef) we can't just wait to make the copy. Instead, we need to make a *shallow* copy of the virDomainDeviceDef prior to qemuDomainAttachDeviceConfig(), then make a deep copy of the shallow copy. This resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1344899 --- src/qemu/qemu_driver.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index fa05046..f3e17e2 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8199,6 +8199,7 @@ qemuDomainAttachDeviceFlags(virDomainPtr dom, virDomainObjPtr vm = NULL; virDomainDefPtr vmdef = NULL; virDomainDeviceDefPtr dev = NULL, dev_copy = NULL; + virDomainDeviceDef dev_shallow; int ret = -1; unsigned int parse_flags = VIR_DOMAIN_DEF_PARSE_INACTIVE | VIR_DOMAIN_DEF_PARSE_ABI_UPDATE; @@ -8237,22 +8238,19 @@ qemuDomainAttachDeviceFlags(virDomainPtr dom, if (dev == NULL) goto endjob; - if (flags & VIR_DOMAIN_AFFECT_CONFIG && - flags & VIR_DOMAIN_AFFECT_LIVE) { - /* If we are affecting both CONFIG and LIVE - * create a deep copy of device as adding - * to CONFIG takes one instance. - */ - dev_copy = virDomainDeviceDefCopy(dev, vm->def, caps, driver->xmlopt); - if (!dev_copy) - goto endjob; - } - if (priv->qemuCaps) qemuCaps = virObjectRef(priv->qemuCaps); else if (!(qemuCaps = virQEMUCapsCacheLookup(driver->qemuCapsCache, vm->def->emulator))) goto endjob; + /* Save away the pointer to the device object before it is + * potentially swallowed up by qemuDomainAttachDeviceConfig(). + * This will allow us to make a copy of the device after any + * modifications made by virDomainDefPostParse() (which is called + * after the new device is added to the config + */ + dev_shallow = *dev; +This looks fragile and complicated, and I've already managed to break it with the XML you provided: $ virsh attach-device f24 cont --live Device attached successfully $ virsh attach-device f24 cont --live --config error: Failed to attach device from cont error: operation failed: target scsi:1 already existsYeah, I noticed that too just before I posted, and am still thinking about how to solve it, but I still think the situation is one step better with this patch in place (although it does have a similar effect on --live + --config attachment of PCI devices - changing from one improper behavior to another. Sigh.) This all comes down to a more general problem that I offhandledly mentioned in an email with Tomasz Flendrich (the GSoC student working on "device address abstraction") - other than the live state starting off as a copy of the config, we make no attempt to maintain consistency of device attributes between the two, and could easily end up with conflicting things between the two. Normally this is mostly transparent to the user, but could cause serious complaints from guest OSes when they are later restarted and all their nice cozy devices previously added with "--live --config" are suddenly moved (this may not make sense now, but there are some further examples below).
Even if the OS doesn't complain, we say we're trying to guarantee that guest ABI will not change. Re-plugging some devices causes that (not USB for example) and we should avoid it.
AFAIK the reason we create a deep copy instead of parsing it again isBut a "deep copy" is just formatting the object and then parsing the resulting XML again.our generation of MAC addresses in the parser: commit 1e0534a770208be6b848c961785db20467deb2fc qemu: Don't parse device twice in attach/detachWell, that's a bit misleading. It *is* parsing twice; it's just that after the patch, the 2nd parse is done on the result of formatting the result of the first parse rather than parsing the original xml again. Because the MAC address is auto-generated as a part of the parsing (rather than during post-parse), we're able to get a MAC address consistent between live and persistent without calling virDomainDefPostParse(). Anything set in the post-parse doesn't have this same happy existence though. (side-note - when I made the patch to allow auto-generating the index, I originally put it directly in the parser rather than post-parse, but this was (rightfully) NACKed by Cole: https://www.redhat.com/archives/libvir-list/2016-May/msg01065.html Doing it the original way would have avoided the "already exists" error, but not the more general problem of diverging/conflicting live state and persistent config.)So the question is: how should we treat the combination of --live and --config?Not just that, but what should we do with --live by itself? Should we allow someone to hotplug a device --live with attributes that would have failed if given to --config (i.e. if they specify a PCI address that is currently unused in the live domain, but is in use in the config)? I could see a valid use case for this - if the device was added with --live and the user later wanted to make that device permanent - but allowing it could also lead to undesired/unexpected device address changes.
it should be possible both ways: --live with one device and --config with other on the same address, and the other way around as well. Checking anything more would make the code way more complex just to tell the user it's *maybe* not what they wanted. Don't forget you can boot a totally different XML than the persistent one with CreateXML.
And what about the current bad behavior when you do this? virsh attach interface f24 --type network --source default --live virsh attach interface f24 --type network --source default --live --config
This can be separated into two different issues. If you do attach-interface, we generate an XML without address, so you should be able to do the above and have 2 more interfaces live, the second one would be identical to the only one added to config. However if you do attach-device on a domain with XML that has the address specified, first with only one of --live/--config and then with both, then the second call should fail.
Without my patch, the 2nd device ends up with a different PCI address in the live and persistent object. With my patch, you end up with a similar error to above (complaining about attempting to use a PCI address that's already in use). So which evil is worse? Refusing to give a device different info for live vs. config (arguably not *as bad* for a PCI address vs. a MAC address or controller index, but still very undesirable)? Or silently allowing the evil, and damn the consequences? The latter would just cause some extra bookkeeping in the guest for a PCI address (the next time you start it, the guest would think the original device had disappeared and been replaced with another at a different address, with varying levels of whinging), but if you change the index of a SCSI controller, you could end up with disks plugged into the wrong controller (hmm, I guess that's not a horrible thing either, unless the device name in the guest is based on the controller's position in the devices, and that device name is used in the OS config somewhere... Yuck - better to make sure it doesn't change.)Try to make the persistent device as close as the live one? Or try to make it match the persistent config (while the live one would match the live config?)I think we should make the two identical. For that to be true 100% of the time, we need to consider both the live and persistent config any time we're adding a device to either (or maybe just when we're adding it to both? I need to think or a minute, and I still haven't made my coffee).
Definitely totally identical or just rollback and return error.
(BTW, we also should consider what happens when you add a scsi disk with --live --config, attaching it to a controller that was added with just --live. I guess the controller should be automatically added to the config, but "dunno, haven't tried".)
I think that controller will be added automatically even if it is not in any of the definitions. However the fact that user/mgmt app treated live and config definitions differently gives us the ability to just simply handle them differently as well. We should not try to make automatic additions way too much automatic IMHO.
Either way, it would be nicer to get the device definition stable before we even try to add it to the persistent definition.The problem is that all of the stuff that "stabilizes" the device definition is done in the post-parse callback, and that requires that the device already be inserted into the domain definition.
Add the definition in the config, do postparse, check if that definition can be added into live, repeat with better data. Rolling back the addition to config is easy. I haven't thought it through yet, though.
Also, calling virDomainDefPostParse after device coldplug is strange, we should be adding a device that does not need ajdustments.Again, we don't have the information necessary to adjust it until we at least have the domain def available (and all the post-parse callback infrastructure assumes/requires that the device already be inserted into the domain). But if we don't do the *entire* virDomainDefPostParse() then we are in danger of missing something during a hotplug. Maybe we need to rethink how we can do the post-parse stuff so that: 1) both the persistent config and (optionally) live state are supplied to the device callbacks (with assurances that the live state is ignored if it's NULL) 2) address and index assignment aren't done with a single call in the domain callback that assigns everything, but instead are done one-by-one in the device post-parse callback, assuring that newly assigned addresses aren't in use in either the persistent config or (if present) the live state. (I think the best way to do this may actually be to resurrect the idea of the pciaddrs "cache" rather than killing it (as is currently being discussed)).
I had some ideas in my mind similar to this; it doesn't really matter if the allocation is extracted from PostParse or not and I think lot of stuff will change during the implementation =)
And now I find myself all out of words, but without a solution or sufficient motivation to look for one. Hopefully something will come up out of the conversation.
Friday, only small amount of coffee, well, it's not the right time to think this through for me, but we definitely need to take a broader approach to all the address assignments. Martin
Attachment:
signature.asc
Description: Digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list