On 07/07/2016 05:53 AM, Xiubo Li wrote: > These days, we experienced one qemu guest agent is not available > error. Then the guest agent couldn't be used forever before rebooting > the libvirtd daemon or reboot the qemu-guest-agent in the VM(you can > also reboot the VM) to clear the priv->agentError. > > This is one bug for a long time in many verisons of libvirtd: > https://bugzilla.redhat.com/show_bug.cgi?id=1090551 > > And there is one python script to reproduce this bug from the bugzilla: > https://github.com/aspirer/study/blob/master/qemu-guest-agent/test2.py > Just set the timeout to 0 at Line26, you can reproduce it very quickly: > rsp = libvirt_qemu.qemuAgentCommand(dom, cmd, 1, 0) // 1 -> 0 > > The reason why this could happen is that: > > In case we received something like {"return": {}} for example, it is > likely that somebody: > > 1) Started GA thread which sent one "guest-ping" command with seconds > equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0) makes this function > return immediately without waiting, but after updating the events, it > will sleep in virCondWait() or virCondWaitUntil(). > > 2) While in AgentIO thread the "guest-ping" command is sent successfully, > with updating the events it will wait for reply. > > 3) Then the GA thread is woken up and the mon->msg is set to NULL and exit. > > 4) At last the AgentIO thread receives the reply of "guest-ping" command > with the mon->msg == NULL. > > This case could be adapt to all the GA commands, including guest-sync, etc. > > This patch will check if this is the case and don't report an error but > return silently. > > Signed-off-by: Xiubo Li <lixiubo@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Zhuoyu Zhang <zhangzhuoyu@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Wei Tang <tangwei@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Yaowei Bai <baiyaowei@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: Qide Chen <chenqide@xxxxxxxxxxxxxxxxxxxx> > --- > > Changes in V2: > - check value type instead of object type. > > > > src/qemu/qemu_agent.c | 37 ++++++++++++++++++++++++++++--------- > src/util/virjson.h | 7 +++++++ > 2 files changed, 35 insertions(+), 9 deletions(-) > Since this has been sitting around for a long time with a response - figured I'd point out something that was pushed today: http://www.redhat.com/archives/libvir-list/2016-September/msg00570.html It addresses/resolves the issue in essentially the same manner, although it doesn't check the empty {} reply, rather it just makes the else condition be a ignored delay reply.... There's also a few more patches to the series. John > diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c > index eeede6b..f5408cc 100644 > --- a/src/qemu/qemu_agent.c > +++ b/src/qemu/qemu_agent.c > @@ -308,7 +308,6 @@ qemuAgentIOProcessLine(qemuAgentPtr mon, > { > virJSONValuePtr obj = NULL; > int ret = -1; > - unsigned long long id; > > VIR_DEBUG("Line [%s]", line); > > @@ -333,16 +332,36 @@ qemuAgentIOProcessLine(qemuAgentPtr mon, > obj = NULL; > ret = 0; > } else { > - /* If we've received something like: > - * {"return": 1234} > - * it is likely that somebody started GA > - * which is now processing our previous > - * guest-sync commands. Check if this is > - * the case and don't report an error but > + virJSONValuePtr val; > + > + /* In case we received something like: > + * {"return": {}} for example, > + * it is likely that somebody: > + * > + * 1) Started GA thread which sent one "guest-ping" command > + * with seconds equals to VIR_DOMAIN_QEMU_AGENT_COMMAND_NOWAIT(0) > + * makes this function return immediately without waiting, but > + * after updating the events, it will sleep in virCondWait() or > + * virCondWaitUntil(). > + * > + * 2) While in AgentIO thread the "guest-ping" command is sent > + * successfully, with updating the events it will wait for reply. > + * > + * 3) Then the GA thread is woken up and the mon->msg is set > + * to NULL and exit. > + * > + * 4) At last the AgentIO thread receives the reply of "guest-ping" > + * command with the mon->msg == NULL. > + * > + * This case could be adapt to all the GA commands, including > + * guest-sync, etc. > + * > + * Check if this is the case and don't report an error but > * return silently. > */ > - if (virJSONValueObjectGetNumberUlong(obj, "return", &id) == 0) { > - VIR_DEBUG("Ignoring delayed reply to guest-sync: %llu", id); > + val = virJSONValueObjectGet(obj, "return"); > + if (val && virJSONValueTypeIsValid(val)) { > + VIR_DEBUG("Ignoring delayed command reply"); > ret = 0; > goto cleanup; > } > diff --git a/src/util/virjson.h b/src/util/virjson.h > index 66ed48a..cb1d973 100644 > --- a/src/util/virjson.h > +++ b/src/util/virjson.h > @@ -80,6 +80,13 @@ struct _virJSONValue { > } data; > }; > > +static inline bool > +virJSONValueTypeIsValid(virJSONValuePtr value) > +{ > + return ((value->type >= VIR_JSON_TYPE_OBJECT) && > + (value->type <= VIR_JSON_TYPE_BOOLEAN)); > +} > + > void virJSONValueFree(virJSONValuePtr value); > > int virJSONValueObjectCreate(virJSONValuePtr *obj, ...) > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list