Re: cont command failing via JSON monitor on restore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/14/2011 04:26 PM, Jim Fehlig wrote:
Laine Stump wrote:
On 01/13/2011 04:29 PM, Jim Fehlig wrote:
Daniel P. Berrange wrote:
  Yep, it wasn't really intended as a fix. It was intended to make
  the error scenario clearly detectable, which has succeeded as per
  Jim's report. The fact that QMP returned an error in this way,
  means we can now reliably detect, usleep(1000), and then retry
  the 'cont' again at which point we should succeed.

Something like the attached patch?  I'm not quite sure about the retry
bounds (currently 1 second).  In my testing, it always succeeds on the
second attempt - even with large memory vms.
In my tests, 250ms was more than enough, so I'm guessing it's okay,
although there's probably no hurt in making it a bit larger - it's not
like this is something that repeats all day every day :-)

Would it be possible for you to add the same thing into the text
monitor version of the code, so both fixes would travel together as
the patch gets cherry-picked around?
I can, but have not seen this issue with the text monitor.  And the
error reporting is not so fine grained correct?

Truthfully I haven't spent any time with the JSON code, and only enough with the text monitor to (locally) put in a rough hack for the same problem - I just delayed 250ms unconditionally.

Maybe it is safest for your patch to just have what you're able to test for.

Of course my system isn't setup to test exactly this fix either - the machine that displays the problem when resuming is still running qemu-kvm-0.12.5.


- if (ret == 0)
-        ret = qemuMonitorJSONCheckError(cmd, reply);
+        if (ret != 0)
+            break;
+
+        /* If no error, we're done */
+        if ((ret = qemuMonitorJSONCheckError(cmd, reply)) == 0)
+            break;
+
+        /* If error class is not MigrationExpected, we're done.
+         * Otherwise try 'cont' cmd again */
+        if (!qemuMonitorJSONHasError(reply, "MigrationExpected"))
+            break;
+
+        virJSONValueFree(reply);
+    } while ((++i<= timeout)&&   (usleep(250000)<=0));

       virJSONValueFree(cmd);
       virJSONValueFree(reply);
Doesn't this end up doing a double-free of reply if it times out?
virJSONValueFree doesn't update the pointer that's free'd like
VIR_FREE does (it can't, since it's a function call rather than a macro).
virJSONValueFree() calls VIR_FREE() on the value passed to it, so reply
should be set to NULL when virJSONValueFree() returns.


Yes, but it's pass by value, not reference, so VIR_FREE() is NULLing the copy of the pointer that was passed as an argument, not the original pointer itself.


While looking at the patch again, not sure I'm fond of the

while ((++i<= timeout)&&   (usleep(250000)<=0));

Perhaps should move the usleep inside the loop and clean that up a bit.

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list


[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]