Laine Stump wrote: > On 01/14/2011 04:26 PM, Jim Fehlig wrote: >> Laine Stump wrote: >>> On 01/13/2011 04:29 PM, Jim Fehlig wrote: >>>> Daniel P. Berrange wrote: >>>>>> Yep, it wasn't really intended as a fix. It was intended to make >>>>>> the error scenario clearly detectable, which has succeeded as per >>>>>> Jim's report. The fact that QMP returned an error in this way, >>>>>> means we can now reliably detect, usleep(1000), and then retry >>>>>> the 'cont' again at which point we should succeed. >>>>>> >>>> Something like the attached patch? I'm not quite sure about the retry >>>> bounds (currently 1 second). In my testing, it always succeeds on the >>>> second attempt - even with large memory vms. >>> In my tests, 250ms was more than enough, so I'm guessing it's okay, >>> although there's probably no hurt in making it a bit larger - it's not >>> like this is something that repeats all day every day :-) >>> >>> Would it be possible for you to add the same thing into the text >>> monitor version of the code, so both fixes would travel together as >>> the patch gets cherry-picked around? >> I can, but have not seen this issue with the text monitor. And the >> error reporting is not so fine grained correct? > > Truthfully I haven't spent any time with the JSON code, and only > enough with the text monitor to (locally) put in a rough hack for the > same problem - I just delayed 250ms unconditionally. > > Maybe it is safest for your patch to just have what you're able to > test for. Agreed. What error do you get back from the text monitor when cont cmd fails? I'd be interested in the following output from $root/src/qemu/qemu_monitor_text.c:qemuMonitorCommandWithHandler() VIR_DEBUG("Receive command reply ret=%d errno=%d %d bytes '%s'", ret, msg.lastErrno, msg.rxLength, msg.rxBuffer); > > Of course my system isn't setup to test exactly this fix either - the > machine that displays the problem when resuming is still running > qemu-kvm-0.12.5. > > >>>> - if (ret == 0) >>>> - ret = qemuMonitorJSONCheckError(cmd, reply); >>>> + if (ret != 0) >>>> + break; >>>> + >>>> + /* If no error, we're done */ >>>> + if ((ret = qemuMonitorJSONCheckError(cmd, reply)) == 0) >>>> + break; >>>> + >>>> + /* If error class is not MigrationExpected, we're done. >>>> + * Otherwise try 'cont' cmd again */ >>>> + if (!qemuMonitorJSONHasError(reply, "MigrationExpected")) >>>> + break; >>>> + >>>> + virJSONValueFree(reply); >>>> + } while ((++i<= timeout)&& (usleep(250000)<=0)); >>>> >>>> virJSONValueFree(cmd); >>>> virJSONValueFree(reply); >>> Doesn't this end up doing a double-free of reply if it times out? >>> virJSONValueFree doesn't update the pointer that's free'd like >>> VIR_FREE does (it can't, since it's a function call rather than a >>> macro). >> virJSONValueFree() calls VIR_FREE() on the value passed to it, so reply >> should be set to NULL when virJSONValueFree() returns. > > > Yes, but it's pass by value, not reference, so VIR_FREE() is NULLing > the copy of the pointer that was passed as an argument, not the > original pointer itself. Err, right. I'll fix this in v2 - but need to understand the text monitor error first. Thanks, Jim -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list