On 10/27/2016 07:34 AM, Nikolay Shirokovskiy wrote: > > > On 26.10.2016 22:57, John Ferlan wrote: >> >> There's no commit message... > > This is simple refactoring. > Not really... A simple refactor would have kept the -2 logic. What's been done here is to let qemuConnectAgent make the decision on "hard" or "soft" error and only use < 0 as failure >> >> You're altering qemuConnectAgent to return -1 on the only error and >> perform the VIR_WARN plus agentError = true on other "soft" failures. >> >> Not exactly nothing going on! > > This is what already present in code, I've just moved soft failures > in qemuConnectAgent. > >> >> On 10/04/2016 02:56 AM, Nikolay Shirokovskiy wrote: >>> --- >>> src/qemu/qemu_driver.c | 8 +------ >>> src/qemu/qemu_migration.c | 13 ++--------- >>> src/qemu/qemu_process.c | 58 ++++++++++++----------------------------------- >>> 3 files changed, 17 insertions(+), 62 deletions(-) >>> >> >> This idea seems to have merit too - something that I've thought now that >> I've read the first 3 patches... >> >> I think you should have started with this patch, it probably would have >> made the others easier to think about. In fact, I'm curious if with just >> this change and the suggestion in patch 3 for clearing agentError >> whether you can reproduced the bug you're trying to fix/resolve without >> any of the other patches. >> >>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c >>> index 12ddbc0..edff973 100644 >>> --- a/src/qemu/qemu_driver.c >>> +++ b/src/qemu/qemu_driver.c >>> @@ -4397,7 +4397,6 @@ processSerialChangedEvent(virQEMUDriverPtr driver, >>> virObjectEventPtr event = NULL; >>> virDomainDeviceDef dev; >>> qemuDomainObjPrivatePtr priv = vm->privateData; >>> - int rc; >>> >>> if (connected) >>> newstate = VIR_DOMAIN_CHR_DEVICE_STATE_CONNECTED; >>> @@ -4450,13 +4449,8 @@ processSerialChangedEvent(virQEMUDriverPtr driver, >>> >>> if (STREQ_NULLABLE(dev.data.chr->target.name, "org.qemu.guest_agent.0")) { >>> if (newstate == VIR_DOMAIN_CHR_DEVICE_STATE_CONNECTED) { >>> - if (!priv->agent) { >>> - if ((rc = qemuConnectAgent(driver, vm)) == -2) >>> + if (!priv->agent && qemuConnectAgent(driver, vm) < 0) >> >> FWIW: >> qemuConnectAgent returns 0 "if (priv->agent)", so there's one less check >> here too. Could just be "if (qemuConnectAgent(driver, vm) < 0)" > > It depends on patch 3. If we clear the error early in qemuConnectAgent then > we'd better check before calling this function. > Right my mind was already there or more succinctly said - I was trying to consider this as the first patch as I think it'll make the reasoning a bit more clear. Since the "issue" has been described now as a pure shutdown and restart after error problem without needing to consider the migration, reconnect, and attach separately - it's easier to focus on. So my feeling now after reading and thinking a bit about everything so far is that this patch could go in (mostly) as is. That way there are then 2 places to set agentError=true and two places to set agentError=false (discluding original allocation and reconnect paths). I don't think we can "combine" the Error and EOF paths. They're separate. What happens if some day someone uses the VSERPORT logic to somehow restart an agent after an error, after a close, and before a shutdown. IOW: Some kind of error recovery logic. In order to handle the issue where agentError is set, but we cannot start again (e.g. the have error, do shutdown, do start path) - add an unconditional priv->agentError = false before in qemuConnectAgent before the virSecurityManagerSetDaemonSocketLabel call. The other agentError = false settings can then be removed. So we're left with 2 places to set error and one place to clear. I'll continue to consider this, bug figured I'd get out a response to all this sooner than later. I'm less convinced mucking with combining EOF and Error is a good idea in patches 7-10, but I haven't thought too much about it either. John >> >>> goto endjob; >> >> The indention for this is off (remove leading 4 spaces) >> >>> - >>> - if (rc < 0) >>> - priv->agentError = true; >>> - } >>> } else { >>> if (priv->agent) { >>> qemuAgentClose(priv->agent); >>> diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c >>> index e2ca330..0a02236 100644 >>> --- a/src/qemu/qemu_migration.c >>> +++ b/src/qemu/qemu_migration.c >>> @@ -6171,7 +6171,6 @@ qemuMigrationFinish(virQEMUDriverPtr driver, >>> unsigned short port; >>> unsigned long long timeReceived = 0; >>> virObjectEventPtr event; >>> - int rc; >>> qemuDomainJobInfoPtr jobInfo = NULL; >>> bool inPostCopy = false; >>> bool doKill = true; >>> @@ -6244,16 +6243,8 @@ qemuMigrationFinish(virQEMUDriverPtr driver, >>> QEMU_ASYNC_JOB_MIGRATION_IN) < 0) >>> goto endjob; >>> >>> - if ((rc = qemuConnectAgent(driver, vm)) < 0) { >>> - if (rc == -2) >>> - goto endjob; >>> - >>> - VIR_WARN("Cannot connect to QEMU guest agent for %s", >>> - vm->def->name); >>> - virResetLastError(); >>> - priv->agentError = true; >>> - } >>> - >>> + if (qemuConnectAgent(driver, vm) < 0) >>> + goto endjob; >>> >>> if (flags & VIR_MIGRATE_PERSIST_DEST) { >>> if (qemuMigrationPersist(driver, vm, mig, !v3proto) < 0) { >>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >>> index 42f7f84..d7c9ce3 100644 >>> --- a/src/qemu/qemu_process.c >>> +++ b/src/qemu/qemu_process.c >>> @@ -206,7 +206,6 @@ int >>> qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) >>> { >>> qemuDomainObjPrivatePtr priv = vm->privateData; >>> - int ret = -1; >>> qemuAgentPtr agent = NULL; >>> virDomainChrDefPtr config = qemuFindAgentConfig(vm->def); >>> >>> @@ -252,8 +251,7 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) >>> qemuAgentClose(agent); >>> virReportError(VIR_ERR_INTERNAL_ERROR, "%s", >>> _("guest crashed while connecting to the guest agent")); >>> - ret = -2; >>> - goto cleanup; >>> + return -1; >>> } >>> >>> if (virSecurityManagerClearSocketLabel(driver->securityManager, >>> @@ -264,18 +262,16 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) >>> goto cleanup; >>> } >>> >>> - >>> priv->agent = agent; >>> >>> - if (priv->agent == NULL) { >>> - VIR_INFO("Failed to connect agent for %s", vm->def->name); >> >> Interesting a "marker" of sorts to know the virAgentOpen "failed" is >> that we'd get an Informational "Failed to connect..." followed shortly >> thereafter by a Warning "Cannot connect..." (depending of course on >> one's message display severity level). >> >> I think if you "restore" this without the goto cleanup (below) and of >> course the removal of the {}, then at least message parity is >> achieved... I suppose it could move up into the "if (agent == NULL)" >> too, but then it could be given even though an Error is reported. >> >> It's not that important, but it does leave some breadcrumbs. > > Agree. Even though I'm still wondering how useful this message is as > this patch is intended to be refactoring let's keep this message. > >> >> Again I'd like to see the breadcrumbs with just this patch applied to >> reproduced what you're chasing in patches 1-4... >> > > Sorry, not possible. The situation I've described in patch 1 is based on analysis. However > after applying the patches 1-3 the bug is not occured in our test system anymoree. > > Nikolay > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list