Laine Stump wrote: > On 01/13/2011 04:29 PM, Jim Fehlig wrote: >> Daniel P. Berrange wrote: >>> > >>> > Yep, it wasn't really intended as a fix. It was intended to make >>> > the error scenario clearly detectable, which has succeeded as per >>> > Jim's report. The fact that QMP returned an error in this way, >>> > means we can now reliably detect, usleep(1000), and then retry >>> > the 'cont' again at which point we should succeed. >>> > >> Something like the attached patch? I'm not quite sure about the retry >> bounds (currently 1 second). In my testing, it always succeeds on the >> second attempt - even with large memory vms. > > In my tests, 250ms was more than enough, so I'm guessing it's okay, > although there's probably no hurt in making it a bit larger - it's not > like this is something that repeats all day every day :-) > > Would it be possible for you to add the same thing into the text > monitor version of the code, so both fixes would travel together as > the patch gets cherry-picked around? I can, but have not seen this issue with the text monitor. And the error reporting is not so fine grained correct? > >> 0001-qemu-Retry-JSON-monitor-cont-cmd-on-MigrationExpecte.patch >> >> >> > From ec9109b40a4b2c45035495e0e4f65824a92dcf3d Mon Sep 17 00:00:00 2001 >> From: Jim Fehlig<jfehlig@xxxxxxxxxx> >> Date: Thu, 13 Jan 2011 12:52:23 -0700 >> Subject: [PATCH] qemu: Retry JSON monitor cont cmd on >> MigrationExpected error >> >> When restoring a saved qemu instance via JSON monitor, the vm is >> left in a paused state. Turns out the 'cont' cmd was failing with >> "MigrationExpected" error class and "An incoming migration is >> expected before this command can be executed" error description >> due to migration (restore) not yet complete. >> >> Detect if 'cont' cmd fails with "MigrationExpecte" error class and >> retry 'cont' cmd. >> --- >> src/qemu/qemu_monitor_json.c | 20 +++++++++++++++++--- >> 1 files changed, 17 insertions(+), 3 deletions(-) >> >> diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c >> index 7877731..63b736e 100644 >> --- a/src/qemu/qemu_monitor_json.c >> +++ b/src/qemu/qemu_monitor_json.c >> @@ -702,13 +702,27 @@ qemuMonitorJSONStartCPUs(qemuMonitorPtr mon, >> int ret; >> virJSONValuePtr cmd = qemuMonitorJSONMakeCommand("cont", NULL); >> virJSONValuePtr reply = NULL; >> + int i = 0, timeout = 3; >> if (!cmd) >> return -1; >> >> - ret = qemuMonitorJSONCommand(mon, cmd,&reply); >> + do { >> + ret = qemuMonitorJSONCommand(mon, cmd,&reply); >> >> - if (ret == 0) >> - ret = qemuMonitorJSONCheckError(cmd, reply); >> + if (ret != 0) >> + break; >> + >> + /* If no error, we're done */ >> + if ((ret = qemuMonitorJSONCheckError(cmd, reply)) == 0) >> + break; >> + >> + /* If error class is not MigrationExpected, we're done. >> + * Otherwise try 'cont' cmd again */ >> + if (!qemuMonitorJSONHasError(reply, "MigrationExpected")) >> + break; >> + >> + virJSONValueFree(reply); >> + } while ((++i<= timeout)&& (usleep(250000)<=0)); >> >> virJSONValueFree(cmd); >> virJSONValueFree(reply); > > Doesn't this end up doing a double-free of reply if it times out? > virJSONValueFree doesn't update the pointer that's free'd like > VIR_FREE does (it can't, since it's a function call rather than a macro). virJSONValueFree() calls VIR_FREE() on the value passed to it, so reply should be set to NULL when virJSONValueFree() returns. While looking at the patch again, not sure I'm fond of the while ((++i<= timeout) && (usleep(250000)<=0)); Perhaps should move the usleep inside the loop and clean that up a bit. Regards, Jim -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list