On Thu, Jun 07, 2012 at 12:03:02 -0600, Eric Blake wrote: > On 06/05/2012 02:36 AM, Guannan Ren wrote: > > Bugzilla:https://bugzilla.redhat.com/show_bug.cgi?id=822839 > > add two general virsh options to support keepalive message protocol > > > > -i | --keepalive_interval interval time value (default 5 seconds) > > -n | --keepalive_count number of heartbeats (default 5 times) > > > > For non-p2p migration, start keepalive for remote driver to > > determine the status of network connection, aborting migrating job > > after defined amount of interval time. > > --- > > tools/virsh.c | 88 +++++++++++++++++++++++++++++++++++++++++++++----------- > > 1 files changed, 70 insertions(+), 18 deletions(-) > > Missing a counterpart documentation in virsh.pod. > > > @@ -7213,6 +7219,12 @@ doMigrate (void *opaque) > > dconn = virConnectOpenAuth (desturi, virConnectAuthPtrDefault, 0); > > if (!dconn) goto out; > > > > + data->dconn = dconn; > > + if (virConnectSetKeepAlive(dconn, > > + ctl->keepalive_interval, > > + ctl->keepalive_count) < 0) > > + vshDebug(ctl, VSH_ERR_WARNING, "migrate: Failed to start keepalive\n"); > > This makes it impossible to migrate to an older server that lacks > virConnectSetKeepAlive() support. virConnectSetKeepAlive returns 1 when remote party does not support keepalive, which means the above code is perfectly compatible with older servers. > You need to avoid doing this if keepalive_interval and/or keepalive_count is > set to a value that avoids keepalive. I think that also means you need to > distinguish between a normal default of 5 seconds with new servers but > unspecified by the user (where we just gracefully continue without > keepalive), compared to an explicit user request of 5 seconds (must fail if > the keepalive request cannot be honored), and compared to an explicit user > request of no keepalive. But I agree that we need to distinguish between default keepalive settings and those explicitly requested. > > @@ -7329,6 +7342,13 @@ repoll: > > goto cleanup; > > } > > > > + if (data->dconn && virConnectIsAlive(data->dconn) <= 0) { > > + virDomainAbortJob(dom); > > + vshError(ctl, "%s", > > + _("Lost connection to destination host")); > > + goto cleanup; > > + } > > Again, need to be careful of older servers that lack virConnectIsAlive() > support. This doesn't call to the server at all. However, virConnectIsAlive returning -1 doesn't mean the connection is closed, especially if it sets the error to VIR_ERR_NO_SUPPORT. > > @@ -18681,6 +18703,18 @@ vshCommandRun(vshControl *ctl, const vshCmd *cmd) > > if (enable_timing) > > GETTIMEOFDAY(&before); > > > > + /* start keepalive for remote driver */ > > + driver = virConnectGetType(ctl->conn); > > + if (STREQ_NULLABLE(driver, "QEMU") || > > + STREQ_NULLABLE(driver, "xenlight") || > > + STREQ_NULLABLE(driver, "UML") || > > + STREQ_NULLABLE(driver, "LXC") || > > + STREQ_NULLABLE(driver, "remote")) > > Why are you limiting this to particular hypervisors? That feels wrong. > Rather, you should blindly attempt it, and gracefully detect when it is > unsupported. Yes, and another issue is that vshCommandRun is a bad place to do this. Keepalive should be enabled just after opening a connection, i.e., in vshReconnect, cmdConnect, and vshInit (in addition to doMigrate, where you correctly do so). I think virsh would benefit from a new wrapper for virConnectOpenAuth that would also deal with keepalive and which would be called from all four places. Jirka -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list