On 07.02.2012 16:18, Daniel P. Berrange wrote: > On Wed, Feb 01, 2012 at 06:04:45PM +0100, Michal Privoznik wrote: >> If we issue guest command and GA is not running, the issuing thread >> will block endlessly. We can check for GA presence by issuing >> guest-sync with unique ID (timestamp). We don't want to issue real >> command as even if GA is not running, once it is started, it process >> all commands written to GA socket. >> >> If we receive reply, it might be from a previous sync command, so >> we need to compare received ID for equality with the given one. >> --- >> >> Some background on this: >> -GA socked is always writeable, even with no GA running (!) >> >> -writing something to dead socket (=without any GA listening) will thus >> succeed. However, when one will (after a period of time) start the GA, >> it will read and process all written data. This makes timeout >> implementation harder. >> >> -Therefore issuing a GA command without any GA running will block indefinitely. >> >> My solution to this is: Issue harmless guest-sync command which returns >> given identifier. To avoid blocking, do a timed wait (5 seconds). >> This is the only case where we do a timed wait, regular GA commands >> will of course wait without any timeout. >> If GA didn't reply within those 5 seconds, we store an ID so we can compare >> it later and report 'GA unavailable'; Note that guest won't get into >> a different state as libvirt thinks it is in. >> Then, if GA is ever started, it start to process our sync commands. >> And all we have to do is check if we have ever issued such ID. >> If not an error is reported. >> >> However, there is still race: >> 1) GA will reply to our guest-sync >> 2) GA gets terminated >> 3) we issue regular command (e.g. fs-freeze) >> >> I am not sure this is avoidable. But this patch makes current situation bearable. > > We could still timeout the 'fs-freeze' command after 30 seconds > or so. Given that we issue the guest-resync command, we'll be > able to automatically re-sync the JSON protocol by dropping the > later arriving fs-freeze reply (if any). I don't think this is a good idea. I've chosen 'fs-freeze' intentionally :) It's something that actually might take ages - to sync disks (which is what current implementation does). Therefore if we set any timeout for regular commands we may get into inconsistent state: 1) issue fs-freeze 2) timeout and return error (everybody thinks fs is not frozen) 3) receive "okay, frozen" from GA > >> >> diff --git a/src/qemu/qemu_agent.c b/src/qemu/qemu_agent.c >> index 9df5546..5ecf893 100644 >> --- a/src/qemu/qemu_agent.c >> +++ b/src/qemu/qemu_agent.c >> @@ -39,6 +39,7 @@ >> #include "virterror_internal.h" >> #include "json.h" >> #include "virfile.h" >> +#include "virtime.h" >> >> #define VIR_FROM_THIS VIR_FROM_QEMU >> >> @@ -106,6 +107,10 @@ struct _qemuAgent { >> /* If anything went wrong, this will be fed back >> * the next monitor msg */ >> virError lastError; >> + >> + /* Array of IDs issued to GA via guest-sync */ >> + unsigned long long *ids; >> + size_t ids_size; >> }; > > I'm not sure that we want / need to store all historic IDs. > At any time, aren't we only interested in the most recent ID > that we have sent. > > We always serialize commands we sent to the agent, so consider > we sent > > guest-sync: id=1 > ...times out... > guest-sync: id=2 > fs-freeze: > > After that first time out we see, we need to discard all > further data received from the agent until we see a guest-sync > command reply with id=2. If we see a guest-snc reply with > id=1, we don't care - we just drop it & carry on waiting > until we get one with id=2. Makes sense. > >> +/** >> + * qemuAgentGuestSync: >> + * @mon: Monitor >> + * >> + * Send guest-sync with unique ID and wait for >> + * reply. If we get one, check if received ID >> + * is equal to given. >> + * >> + * Returns: 0 on success, >> + * -1 otherwise >> + */ >> +static int >> +qemuAgentGuestSync(qemuAgentPtr mon) >> +{ >> + int ret = -1; >> + int send_ret; >> + unsigned long long id, id_ret; >> + qemuAgentMessage sync_msg; >> + >> + memset(&sync_msg, 0, sizeof sync_msg); >> + >> + if (virTimeMillisNow(&id) < 0) >> + return -1; >> + >> + if (virAsprintf(&sync_msg.txBuffer, >> + "{\"execute\":\"guest-sync\", " >> + "\"arguments\":{\"id\":%llu}}", id) < 0) { >> + virReportOOMError(); >> + return -1; >> + } >> + sync_msg.txLength = strlen(sync_msg.txBuffer); >> + >> + VIR_DEBUG("Sending guest-sync command with ID: %llu", id); > > According to the 'guest-sync' QMP spec, we need to send the magic byte > '0xFF' immediately before the guest-sync command data is sent. Yeah, and probably switch to new guest-sync-delimited command as soon as it's upstream. > > Regards, > Daniel -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list