Re: [PATCH v2] virsh: add keepalive protocol in virsh

Jiri Denemark <jdenemar@xxxxxxxxxx> · Fri, 8 Jun 2012 09:46:08 +0200

On Thu, Jun 07, 2012 at 12:03:02 -0600, Eric Blake wrote:
> On 06/05/2012 02:36 AM, Guannan Ren wrote:
> > Bugzilla:https://bugzilla.redhat.com/show_bug.cgi?id=822839
> > add two general virsh options to support keepalive message protocol
> > 
> > -i | --keepalive_interval   interval time value (default 5 seconds)
> > -n | --keepalive_count      number of heartbeats (default 5 times)
> > 
> > For non-p2p migration, start keepalive for remote driver to
> > determine the status of network connection, aborting migrating job
> > after defined amount of interval time.
> > ---
> >  tools/virsh.c |   88 +++++++++++++++++++++++++++++++++++++++++++++-----------
> >  1 files changed, 70 insertions(+), 18 deletions(-)
> 
> Missing a counterpart documentation in virsh.pod.
> 
> > @@ -7213,6 +7219,12 @@ doMigrate (void *opaque)
> >          dconn = virConnectOpenAuth (desturi, virConnectAuthPtrDefault, 0);
> >          if (!dconn) goto out;
> >  
> > +        data->dconn = dconn;
> > +        if (virConnectSetKeepAlive(dconn,
> > +                                   ctl->keepalive_interval,
> > +                                   ctl->keepalive_count) < 0)
> > +             vshDebug(ctl, VSH_ERR_WARNING, "migrate: Failed to start keepalive\n");
> 
> This makes it impossible to migrate to an older server that lacks
> virConnectSetKeepAlive() support.

virConnectSetKeepAlive returns 1 when remote party does not support keepalive,
which means the above code is perfectly compatible with older servers.

> You need to avoid doing this if keepalive_interval and/or keepalive_count is
> set to a value that avoids keepalive.  I think that also means you need to
> distinguish between a normal default of 5 seconds with new servers but
> unspecified by the user (where we just gracefully continue without
> keepalive), compared to an explicit user request of 5 seconds (must fail if
> the keepalive request cannot be honored), and compared to an explicit user
> request of no keepalive.

But I agree that we need to distinguish between default keepalive settings and
those explicitly requested.

> > @@ -7329,6 +7342,13 @@ repoll:
> >              goto cleanup;
> >          }
> >  
> > +       if (data->dconn && virConnectIsAlive(data->dconn) <= 0) {
> > +            virDomainAbortJob(dom);
> > +            vshError(ctl, "%s",
> > +                      _("Lost connection to destination host"));
> > +            goto cleanup;
> > +        }
> 
> Again, need to be careful of older servers that lack virConnectIsAlive()
> support.

This doesn't call to the server at all. However, virConnectIsAlive returning
-1 doesn't mean the connection is closed, especially if it sets the error to
VIR_ERR_NO_SUPPORT.

> > @@ -18681,6 +18703,18 @@ vshCommandRun(vshControl *ctl, const vshCmd *cmd)
> >          if (enable_timing)
> >              GETTIMEOFDAY(&before);
> >  
> > +        /* start keepalive for remote driver */
> > +        driver = virConnectGetType(ctl->conn);
> > +        if (STREQ_NULLABLE(driver, "QEMU") ||
> > +            STREQ_NULLABLE(driver, "xenlight") ||
> > +            STREQ_NULLABLE(driver, "UML") ||
> > +            STREQ_NULLABLE(driver, "LXC") ||
> > +            STREQ_NULLABLE(driver, "remote"))
> 
> Why are you limiting this to particular hypervisors?  That feels wrong.
>  Rather, you should blindly attempt it, and gracefully detect when it is
> unsupported.

Yes, and another issue is that vshCommandRun is a bad place to do this.
Keepalive should be enabled just after opening a connection, i.e., in
vshReconnect, cmdConnect, and vshInit (in addition to doMigrate, where you
correctly do so). I think virsh would benefit from a new wrapper for
virConnectOpenAuth that would also deal with keepalive and which would be
called from all four places.

Jirka

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list