Re: [PATCH v2] virsh: add keepalive protocol in virsh

Guannan Ren <gren@xxxxxxxxxx> · Fri, 08 Jun 2012 18:56:18 +0800




On 06/08/2012 03:46 PM, Jiri Denemark wrote:
On Thu, Jun 07, 2012 at 12:03:02 -0600, Eric Blake wrote:
On 06/05/2012 02:36 AM, Guannan Ren wrote:
Bugzilla:https://bugzilla.redhat.com/show_bug.cgi?id=822839
add two general virsh options to support keepalive message protocol

-i | --keepalive_interval   interval time value (default 5 seconds)
-n | --keepalive_count      number of heartbeats (default 5 times)

For non-p2p migration, start keepalive for remote driver to
determine the status of network connection, aborting migrating job
after defined amount of interval time.
---
  tools/virsh.c |   88 +++++++++++++++++++++++++++++++++++++++++++++-----------
  1 files changed, 70 insertions(+), 18 deletions(-)
Missing a counterpart documentation in virsh.pod.

@@ -7213,6 +7219,12 @@ doMigrate (void *opaque)
          dconn = virConnectOpenAuth (desturi, virConnectAuthPtrDefault, 0);
          if (!dconn) goto out;

+        data->dconn = dconn;
+        if (virConnectSetKeepAlive(dconn,
+                                   ctl->keepalive_interval,
+                                   ctl->keepalive_count)<  0)
+             vshDebug(ctl, VSH_ERR_WARNING, "migrate: Failed to start keepalive\n");
This makes it impossible to migrate to an older server that lacks
virConnectSetKeepAlive() support.
virConnectSetKeepAlive returns 1 when remote party does not support keepalive,
which means the above code is perfectly compatible with older servers.

You need to avoid doing this if keepalive_interval and/or keepalive_count is
set to a value that avoids keepalive.  I think that also means you need to
distinguish between a normal default of 5 seconds with new servers but
unspecified by the user (where we just gracefully continue without
keepalive), compared to an explicit user request of 5 seconds (must fail if
the keepalive request cannot be honored), and compared to an explicit user
request of no keepalive.
But I agree that we need to distinguish between default keepalive settings and
those explicitly requested.

       If the server supports keepalive,   virConnectSetKeepAlive will 
success, otherwise failed.
       I think, on success the default value 5 seconds or the values 
from options
       --keepalive_interval and --keepalive_count will be used.
       On failure,  all value are not invalid even given by user 
explicitly.
       That means the keepalive is set up or not automatically by client.
       What the user will do is to override the default values or leave 
them untouched.





@@ -7329,6 +7342,13 @@ repoll:
              goto cleanup;
          }

+       if (data->dconn&&  virConnectIsAlive(data->dconn)<= 0) {
+            virDomainAbortJob(dom);
+            vshError(ctl, "%s",
+                      _("Lost connection to destination host"));
+            goto cleanup;
+        }
Again, need to be careful of older servers that lack virConnectIsAlive()
support.
This doesn't call to the server at all. However, virConnectIsAlive returning
-1 doesn't mean the connection is closed, especially if it sets the error to
VIR_ERR_NO_SUPPORT.

       virConnectIsAlive from remote driver is for determining the 
working of
       client, after certain time, the keepalive will close the client, 
we use the
       function to decide when to abort the migration job.



@@ -18681,6 +18703,18 @@ vshCommandRun(vshControl *ctl, const vshCmd *cmd)
          if (enable_timing)
              GETTIMEOFDAY(&before);

+        /* start keepalive for remote driver */
+        driver = virConnectGetType(ctl->conn);
+        if (STREQ_NULLABLE(driver, "QEMU") ||
+            STREQ_NULLABLE(driver, "xenlight") ||
+            STREQ_NULLABLE(driver, "UML") ||
+            STREQ_NULLABLE(driver, "LXC") ||
+            STREQ_NULLABLE(driver, "remote"))
Why are you limiting this to particular hypervisors?  That feels wrong.
  Rather, you should blindly attempt it, and gracefully detect when it is
unsupported.
Yes, and another issue is that vshCommandRun is a bad place to do this.
Keepalive should be enabled just after opening a connection, i.e., in
vshReconnect, cmdConnect, and vshInit (in addition to doMigrate, where you
correctly do so). I think virsh would benefit from a new wrapper for
virConnectOpenAuth that would also deal with keepalive and which would be
called from all four places.

Jirka

       I agree with the common function called from these four places 
to set
       keepalive up.


--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list