Re: [PATCH] qemu: agent: Fix QEMU guest agent is not available due to an error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

For this bug, by checking the object's type is valid or not will always be true.
We should check for the object's value type instead.

I will send the V2 patch.

Thanks,

BRs
Xiubo


On 07/07/2016 09:12, Xiubo Li wrote:
Hi All,

This patch I have test for more than 3 days using 5 different scripts at
the same time, and the test scripts and libvirtd are all running happily
till now.


Thanks,
BRs
Xiubo


On 06/07/2016 15:21, Xiubo Li wrote:
These days, we experienced one qemu guest agent is not available
error. Then the guest agent couldn't be used forever before rebooting
the libvirtd daemon or reboot the qemu-guest-agent in the VM(you can
also reboot the VM) to clear the priv->agentError.

This is one bug for a long time in many verisons of libvirtd:
https://bugzilla.redhat.com/show_bug.cgi?id=1090551

And there is one python script to reproduce this bug from the bugzilla:
https://github.com/aspirer/study/blob/master/qemu-guest-agent/test2.py
Just set the timeout to 0 at Line26, you can reproduce it very quickly:
rsp = libvirt_qemu.qemuAgentCommand(dom, cmd, 1, 0) // 1 -> 0

The reason why this could happen is that:

In case we received something like:  {"return": {}} for example, it is
likely that somebody:

1) Started GA which sent one "guest-ping" command with seconds equals
to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0), which makes this function
return "immediately" without waiting, and sleeps in virCondWait().

2) While in AgentIO thread the "guest-ping" command is sent successfully,
and then waiting for reply.

3) Then the GA thread is woken up and the mon->msg is set to NULL and
exit.

4) Now the AgentIO has received the reply for "guest-ping" command with
the mon->msg == NULL.

Before we just check the guest-sync case, and should also check for all
the other GA commands.

I have test many of the GA commands, which all can reporduce this bug.
This patch will check if this is the case and don't report an error but
return silently.

Signed-off-by: Xiubo Li <lixiubo@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Zhuoyu Zhang <zhangzhuoyu@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Wei Tang <tangwei@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Yaowei Bai <baiyaowei@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Qide Chen <chenqide@xxxxxxxxxxxxxxxxxxxx>
---
  src/qemu/qemu_agent.c | 30 +++++++++++++++++++++---------
  src/util/virjson.h    |  7 +++++++
  2 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c
index eeede6b..5f08ba0 100644
--- a/src/qemu/qemu_agent.c
+++ b/src/qemu/qemu_agent.c
@@ -308,7 +308,6 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
  {
      virJSONValuePtr obj = NULL;
      int ret = -1;
-    unsigned long long id;

      VIR_DEBUG("Line [%s]", line);

@@ -333,16 +332,29 @@ qemuAgentIOProcessLine(qemuAgentPtr mon,
              obj = NULL;
              ret = 0;
          } else {
-            /* If we've received something like:
-             *  {"return": 1234}
-             * it is likely that somebody started GA
-             * which is now processing our previous
-             * guest-sync commands. Check if this is
-             * the case and don't report an error but
+            /* In case we received something like:
+             *  {"return": {}} for example,
+             * it is likely that somebody:
+             *
+             * 1) Started GA which sent one "guest-ping" command with
+             * seconds equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0)
+             * makes this function return immediately without waiting,
+             * and sleeps in virCondWait().
+             *
+             * 2) While in AgentIO thread the "guest-ping" command is
sent
+             * successfully, and the waiting for reply.
+             *
+             * 3) Then the GA thread is woken up and the mon->msg is set
+             * to NULL and exit.
+             *
+             * 4) Now the AgentIO has received the reply for
"guest-ping"
+             * command with the mon->msg == NULL.
+             *
+             * Check if this is the case and don't report an error but
               * return silently.
               */
-            if (virJSONValueObjectGetNumberUlong(obj, "return", &id)
== 0) {
-                VIR_DEBUG("Ignoring delayed reply to guest-sync:
%llu", id);
+            if (virIsValidJSONType(obj)) {
+                VIR_DEBUG("Ignoring delayed command reply");
                  ret = 0;
                  goto cleanup;
              }
diff --git a/src/util/virjson.h b/src/util/virjson.h
index 66ed48a..d36ce8e 100644
--- a/src/util/virjson.h
+++ b/src/util/virjson.h
@@ -80,6 +80,13 @@ struct _virJSONValue {
      } data;
  };

+static inline bool
+virIsValidJSONType(virJSONValuePtr object)
+{
+    return ((object->type >= VIR_JSON_TYPE_OBJECT)
+        && (object->type <= VIR_JSON_TYPE_BOOLEAN));
+}
+
  void virJSONValueFree(virJSONValuePtr value);

  int virJSONValueObjectCreate(virJSONValuePtr *obj, ...)



--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list



[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]