These days, we experienced one qemu guest agent is not available error. Then the guest agent couldn't be used forever before rebooting the libvirtd daemon or reboot the qemu-guest-agent in the VM(you can also reboot the VM) to clear the priv->agentError. This is one bug for a long time in many verisons of libvirtd: https://bugzilla.redhat.com/show_bug.cgi?id=1090551 And there is one python script to reproduce this bug from the bugzilla: https://github.com/aspirer/study/blob/master/qemu-guest-agent/test2.py Just set the timeout to 0 at Line26, you can reproduce it very quickly: rsp = libvirt_qemu.qemuAgentCommand(dom, cmd, 1, 0) // 1 -> 0 The reason why this could happen is that: In case we received something like: {"return": {}} for example, it is likely that somebody: 1) Started GA which sent one "guest-ping" command with seconds equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0), which makes this function return "immediately" without waiting, and sleeps in virCondWait(). 2) While in AgentIO thread the "guest-ping" command is sent successfully, and then waiting for reply. 3) Then the GA thread is woken up and the mon->msg is set to NULL and exit. 4) Now the AgentIO has received the reply for "guest-ping" command with the mon->msg == NULL. Before we just check the guest-sync case, and should also check for all the other GA commands. I have test many of the GA commands, which all can reporduce this bug. This patch will check if this is the case and don't report an error but return silently. Signed-off-by: Xiubo Li <lixiubo@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Zhuoyu Zhang <zhangzhuoyu@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Wei Tang <tangwei@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Yaowei Bai <baiyaowei@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Qide Chen <chenqide@xxxxxxxxxxxxxxxxxxxx> --- src/qemu/qemu_agent.c | 30 +++++++++++++++++++++--------- src/util/virjson.h | 7 +++++++ 2 files changed, 28 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c index eeede6b..5f08ba0 100644 --- a/src/qemu/qemu_agent.c +++ b/src/qemu/qemu_agent.c @@ -308,7 +308,6 @@ qemuAgentIOProcessLine(qemuAgentPtr mon, { virJSONValuePtr obj = NULL; int ret = -1; - unsigned long long id; VIR_DEBUG("Line [%s]", line); @@ -333,16 +332,29 @@ qemuAgentIOProcessLine(qemuAgentPtr mon, obj = NULL; ret = 0; } else { - /* If we've received something like: - * {"return": 1234} - * it is likely that somebody started GA - * which is now processing our previous - * guest-sync commands. Check if this is - * the case and don't report an error but + /* In case we received something like: + * {"return": {}} for example, + * it is likely that somebody: + * + * 1) Started GA which sent one "guest-ping" command with + * seconds equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0) + * makes this function return immediately without waiting, + * and sleeps in virCondWait(). + * + * 2) While in AgentIO thread the "guest-ping" command is sent + * successfully, and the waiting for reply. + * + * 3) Then the GA thread is woken up and the mon->msg is set + * to NULL and exit. + * + * 4) Now the AgentIO has received the reply for "guest-ping" + * command with the mon->msg == NULL. + * + * Check if this is the case and don't report an error but * return silently. */ - if (virJSONValueObjectGetNumberUlong(obj, "return", &id) == 0) { - VIR_DEBUG("Ignoring delayed reply to guest-sync: %llu", id); + if (virIsValidJSONType(obj)) { + VIR_DEBUG("Ignoring delayed command reply"); ret = 0; goto cleanup; } diff --git a/src/util/virjson.h b/src/util/virjson.h index 66ed48a..d36ce8e 100644 --- a/src/util/virjson.h +++ b/src/util/virjson.h @@ -80,6 +80,13 @@ struct _virJSONValue { } data; }; +static inline bool +virIsValidJSONType(virJSONValuePtr object) +{ + return ((object->type >= VIR_JSON_TYPE_OBJECT) + && (object->type <= VIR_JSON_TYPE_BOOLEAN)); +} + void virJSONValueFree(virJSONValuePtr value); int virJSONValueObjectCreate(virJSONValuePtr *obj, ...) -- 1.8.3.1 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list