On 11.03.2020 12:38, Michal Privoznik wrote: > On 3/5/20 3:47 PM, Nikolay Shirokovskiy wrote: >> needReply added in [1] looks redundant. Indeed it is set to false only >> when mon->await_event is set too (the only exception qemuAgentFSTrim >> which is mistaken). >> >> However it fixes the issue when qemuAgentCommand exits on error path and >> mon->await_event is not reset. Let's instead reset mon->await_event properly. >> >> Also remove "Woken up by event" debug message as it can be misleading. >> We can get it also if monitor is closed due to serial changed event >> currently. Anyway both qemuAgentClose and qemuAgentNotifyEvent log >> itself. >> >> [1] qemu: make sure agent returns error when required data are missing >> >> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >> --- >> src/qemu/qemu_agent.c | 47 ++++++++++++++++++++----------------------- >> 1 file changed, 22 insertions(+), 25 deletions(-) >> >> diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c >> index 2de53efb7a..605c9f563e 100644 >> --- a/src/qemu/qemu_agent.c >> +++ b/src/qemu/qemu_agent.c >> @@ -1112,7 +1112,6 @@ static int >> qemuAgentCommand(qemuAgentPtr mon, >> virJSONValuePtr cmd, >> virJSONValuePtr *reply, >> - bool needReply, >> int seconds) >> { >> int ret = -1; >> @@ -1121,17 +1120,16 @@ qemuAgentCommand(qemuAgentPtr mon, >> int await_event = mon->await_event; >> *reply = NULL; >> + memset(&msg, 0, sizeof(msg)); >> if (!mon->running) { >> virReportError(VIR_ERR_AGENT_UNRESPONSIVE, "%s", >> _("Guest agent disappeared while executing command")); >> - return -1; >> + goto cleanup; >> } >> if (qemuAgentGuestSync(mon) < 0) >> - return -1; >> - >> - memset(&msg, 0, sizeof(msg)); >> + goto cleanup; >> if (!(cmdstr = virJSONValueToString(cmd, false))) >> goto cleanup; >> @@ -1149,9 +1147,7 @@ qemuAgentCommand(qemuAgentPtr mon, >> /* If we haven't obtained any reply but we wait for an >> * event, then don't report this as error */ >> if (!msg.rxObject) { >> - if (await_event && !needReply) { >> - VIR_DEBUG("Woken up by event %d", await_event); >> - } else { > > Please keep this debug printing. It doesn't hurt anything and has potential of helping us to debug. This debug line is not correct. I reflected it in the commit message. I actually get this line during debug session when agent monitor is closed during VM shutdown on serial event and not shutdown event. Serial event just comes first. > >> + if (!await_event) { >> if (mon->running) >> virReportError(VIR_ERR_INTERNAL_ERROR, "%s", >> _("Missing monitor reply object")); >> @@ -1169,6 +1165,7 @@ qemuAgentCommand(qemuAgentPtr mon, >> cleanup: >> VIR_FREE(cmdstr); >> VIR_FREE(msg.txBuffer); >> + mon->await_event = QEMU_AGENT_EVENT_NONE; > > This is the part I don't like about this patch. The rest looks fine. Why is this needed? Either the mon->await_event is not set by the caller, or if set the the event handler will clear it out by calling qemuAgentNotifyEvent(). Or am I missing something? The whole point of the patch is in this line) Yeah, you are talking of success paths. But on error paths mon->await_event is not cleared currently so next command which need reply can be finish on await event without needReply which lead to SIGSGEV. This is what patch [1] fixes by introducing needReply which don't need to be cleared as it is on stack. I guess scenario is next given the bug [2] mentioned in [1]: - vm shutdown command is sent to GA - it is finished with timeout (60 s), mon->await_event is not cleared - guest ping is sent - guest shutdowned and ping command get awakend by this event as mon->await_event is set (bug) So needReply argument is helpful but it is cleaner just to reset mon->await_event properly. [1] qemu: make sure agent returns error when required data are missing [2] https://bugzilla.redhat.com/show_bug.cgi?id=1058149 Nikolay > >> return ret; >> } >> @@ -1269,7 +1266,7 @@ int qemuAgentShutdown(qemuAgentPtr mon, >> mon->await_event = QEMU_AGENT_EVENT_RESET; >> else >> mon->await_event = QEMU_AGENT_EVENT_SHUTDOWN; >> - ret = qemuAgentCommand(mon, cmd, &reply, false, >> + ret = qemuAgentCommand(mon, cmd, &reply, >> VIR_DOMAIN_QEMU_AGENT_COMMAND_SHUTDOWN); >> virJSONValueFree(cmd); >> @@ -1312,7 +1309,7 @@ int qemuAgentFSFreeze(qemuAgentPtr mon, const char **mountpoints, >> if (!cmd) >> goto cleanup; >> - if (qemuAgentCommand(mon, cmd, &reply, true, mon->timeout) < 0) >> + if (qemuAgentCommand(mon, cmd, &reply, mon->timeout) < 0) >> goto cleanup; >> if (virJSONValueObjectGetNumberInt(reply, "return", &ret) < 0) { >> @@ -1349,7 +1346,7 @@ int qemuAgentFSThaw(qemuAgentPtr mon) >> if (!cmd) >> return -1; >> - if (qemuAgentCommand(mon, cmd, &reply, true, mon->timeout) < 0) >> + if (qemuAgentCommand(mon, cmd, &reply, mon->timeout) < 0) >> goto cleanup; >> if (virJSONValueObjectGetNumberInt(reply, "return", &ret) < 0) { >> @@ -1386,7 +1383,7 @@ qemuAgentSuspend(qemuAgentPtr mon, >> return -1; >> mon->await_event = QEMU_AGENT_EVENT_SUSPEND; >> - ret = qemuAgentCommand(mon, cmd, &reply, false, mon->timeout); >> + ret = qemuAgentCommand(mon, cmd, &reply, mon->timeout); >> virJSONValueFree(cmd); >> virJSONValueFree(reply); >> @@ -1415,7 +1412,7 @@ qemuAgentArbitraryCommand(qemuAgentPtr mon, >> if (!(cmd = virJSONValueFromString(cmd_str))) >> goto cleanup; >> - if ((ret = qemuAgentCommand(mon, cmd, &reply, true, timeout)) < 0) >> + if ((ret = qemuAgentCommand(mon, cmd, &reply, timeout)) < 0) >> goto cleanup; >> if (!(*result = virJSONValueToString(reply, false))) >> @@ -1442,7 +1439,7 @@ qemuAgentFSTrim(qemuAgentPtr mon, >> if (!cmd) >> return ret; >> - ret = qemuAgentCommand(mon, cmd, &reply, false, mon->timeout); >> + ret = qemuAgentCommand(mon, cmd, &reply, mon->timeout); > > Oh this is fun. This 'false' you are removing should have been true as the command does have a reply that we need to wait for. > >> virJSONValueFree(cmd); >> virJSONValueFree(reply); > > Michal >