[PATCHv2] qemu: agent: Fix QEMU guest agent is not available due to an error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



These days, we experienced one qemu guest agent is not available
error. Then the guest agent couldn't be used forever before rebooting
the libvirtd daemon or reboot the qemu-guest-agent in the VM(you can
also reboot the VM) to clear the priv->agentError.

This is one bug for a long time in many verisons of libvirtd:
https://bugzilla.redhat.com/show_bug.cgi?id=1090551

And there is one python script to reproduce this bug from the bugzilla:
https://github.com/aspirer/study/blob/master/qemu-guest-agent/test2.py
Just set the timeout to 0 at Line26, you can reproduce it very quickly:
rsp = libvirt_qemu.qemuAgentCommand(dom, cmd, 1, 0) // 1 -> 0

The reason why this could happen is that:

In case we received something like {"return": {}} for example, it is
likely that somebody:

1) Started GA thread which sent one "guest-ping" command with seconds
equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0) makes this function
return immediately without waiting, but after updating the events, it
will sleep in virCondWait() or virCondWaitUntil().

2) While in AgentIO thread the "guest-ping" command is sent successfully,
with updating the events it will wait for reply.

3) Then the GA thread is woken up and the mon->msg is set to NULL and exit.

4) At last the AgentIO thread receives the reply of "guest-ping" command
with the mon->msg == NULL.

This case could be adapt to all the GA commands, including guest-sync, etc.

This patch will check if this is the case and don't report an error but
return silently.

Signed-off-by: Xiubo Li <lixiubo@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Zhuoyu Zhang <zhangzhuoyu@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Wei Tang <tangwei@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Yaowei Bai <baiyaowei@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Qide Chen <chenqide@xxxxxxxxxxxxxxxxxxxx>
---

Changes in V2:
- check value type instead of object type.



 src/qemu/qemu_agent.c | 37 ++++++++++++++++++++++++++++---------
 src/util/virjson.h    |  7 +++++++
 2 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
index eeede6b..f5408cc 100644
--- a/src/qemu/qemu_agent.c
+++ b/src/qemu/qemu_agent.c
@@ -308,7 +308,6 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
 {
     virJSONValuePtr obj = NULL;
     int ret = -1;
-    unsigned long long id;
 
     VIR_DEBUG("Line [%s]", line);
 
@@ -333,16 +332,36 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
             obj = NULL;
             ret = 0;
         } else {
-            /* If we've received something like:
-             *  {"return": 1234}
-             * it is likely that somebody started GA
-             * which is now processing our previous
-             * guest-sync commands. Check if this is
-             * the case and don't report an error but
+            virJSONValuePtr val;
+
+            /* In case we received something like:
+             *  {"return": {}} for example,
+             * it is likely that somebody:
+             *
+             * 1) Started GA thread which sent one "guest-ping" command
+             * with seconds equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0)
+             * makes this function return immediately without waiting, but
+             * after updating the events, it will sleep in virCondWait() or
+             * virCondWaitUntil().
+             *
+             * 2) While in AgentIO thread the "guest-ping" command is sent
+             * successfully, with updating the events it will wait for reply.
+             *
+             * 3) Then the GA thread is woken up and the mon->msg is set
+             * to NULL and exit.
+             *
+             * 4) At last the AgentIO thread receives the reply of "guest-ping"
+             * command with the mon->msg == NULL.
+             *
+             * This case could be adapt to all the GA commands, including
+             * guest-sync, etc.
+             *
+             * Check if this is the case and don't report an error but
              * return silently.
              */
-            if (virJSONValueObjectGetNumberUlong(obj, "return", &id) == 0) {
-                VIR_DEBUG("Ignoring delayed reply to guest-sync: %llu", id);
+            val = virJSONValueObjectGet(obj, "return");
+            if (val && virJSONValueTypeIsValid(val)) {
+                VIR_DEBUG("Ignoring delayed command reply");
                 ret = 0;
                 goto cleanup;
             }
diff --git a/src/util/virjson.h b/src/util/virjson.h
index 66ed48a..cb1d973 100644
--- a/src/util/virjson.h
+++ b/src/util/virjson.h
@@ -80,6 +80,13 @@ struct _virJSONValue {
     } data;
 };
 
+static inline bool
+virJSONValueTypeIsValid(virJSONValuePtr value)
+{
+    return ((value->type >= VIR_JSON_TYPE_OBJECT) &&
+            (value->type <= VIR_JSON_TYPE_BOOLEAN));
+}
+
 void virJSONValueFree(virJSONValuePtr value);
 
 int virJSONValueObjectCreate(virJSONValuePtr *obj, ...)
-- 
1.8.3.1



--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]