Re: [RFC PATCH] lxc: don't return error on GetInfo when cgroups not yet set up

"Daniel P. Berrange" <berrange@xxxxxxxxxx> · Mon, 3 Oct 2011 18:00:43 +0100

On Fri, Sep 30, 2011 at 11:00:53AM -0500, Serge Hallyn wrote:
> Quoting Daniel P. Berrange (berrange@xxxxxxxxxx):
> > On Thu, Sep 29, 2011 at 10:12:17PM -0500, Serge E. Hallyn wrote:
> > > Quoting Daniel P. Berrange (berrange@xxxxxxxxxx):
> > > > On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote:
> > > > > Nova (openstack) calls libvirt to create a container, then
> > > > > periodically checks using GetInfo to see whether the container
> > > > > is up.  If it does this too quickly, then libvirt returns an
> > > > > error, which in libvirt.py causes an exception to be raised,
> > > > > the same type as if the container was bad.
> > > > lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of
> > > > its execution. It checks for virDomainObjIsActive() before
> > > > trying to use the cgroups.
> > > 
> > > Yes, it does, but
> > > 
> > > > lxcDomainStart(), holds the mutex on 'dom' for the duration of
> > > > its execution, and does not return until the container is running
> > > > and cgroups are present.
> > > 
> > > No.  It calls the lxc_controller with --background.  The controller
> > > main task in turn exits before the cgroups have been set up.  There
> > > is the race.
> > 
> > The lxcDomainStart() method isn't actually waiting on the child
> > pid directly, so the --background flag ought not to matter. We
> > have a pipe that we pass into the controller, which we wait on
> > for a notification after running the process. The controller
> > does not notify the 'handshake' FD until after cgroups have
> > been setup, unless I'm mis-interpreting our code
> 
> That's the call to lxcContainerWaitForContinue(), right?  If so, that's
> done by lxcContainerChild(), which is called by the lxc_controller.
> AFAICS there is nothing in the lxc_driver which will wait on that
> before dropping the driver->lock mutex.

In lxcVmStart(), which runs while driver->lock is held we have the
following section of code in play:

    ....
    if (virCommandRun(cmd, NULL) < 0)
        goto cleanup;

    if (VIR_CLOSE(handshakefds[1]) < 0) {
        virReportSystemError(errno, "%s", _("could not close handshake fd"));
        goto cleanup;
    }

    /* Connect to the controller as a client *first* because
     * this will block until the child has written their
     * pid file out to disk */
    if ((priv->monitor = lxcMonitorClient(driver, vm)) < 0)
        goto cleanup;

    /* And get its pid */
    if ((r = virPidFileRead(driver->stateDir, vm->def->name, &vm->pid)) < 0) {
        virReportSystemError(-r,
                             _("Failed to read pid file %s/%s.pid"),
                             driver->stateDir, vm->def->name);
        goto cleanup;
    }

    vm->def->id = vm->pid;
    virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, reason);

    if (lxcContainerWaitForContinue(handshakefds[0]) < 0) {
    ....

The 'virCommandRun' is where  libvirt_lxc controller is forked (in the
background). The main libvirt LXC driver code then blocks on this
'lxcContainerWaitForContinue(handshakefds[0]) line, for the controller
to finish initializing.

The LXC controller 'main' method received the handshake FD and invokes
lxcControllerRun(). This method does various setup tasks, in particular
the following:

    ....
    if (lxcSetContainerResources(def) < 0)
        goto cleanup;
    ...
    if ((container = lxcContainerStart(def,
                                       nveths,
                                       veths,
                                       control[1],
                                       containerhandshake[1],
                                       containerPtyPath)) < 0)
        goto cleanup;
    VIR_FORCE_CLOSE(control[1]);
    VIR_FORCE_CLOSE(containerhandshake[1]);

    if (lxcControllerMoveInterfaces(nveths, veths, container) < 0)
        goto cleanup;

    if (lxcContainerSendContinue(control[0]) < 0) {
        virReportSystemError(errno, "%s",
                             _("Unable to send container continue message"));
        goto cleanup;
    }

    if (lxcContainerWaitForContinue(containerhandshake[0]) < 0) {
        virReportSystemError(errno, "%s",
                             _("error receiving signal from container"));
        goto cleanup;
    }

    /* Now the container is fully setup... */

    ....

    /* ...and reduce our privileges */
    if (lxcControllerClearCapabilities() < 0)
        goto cleanup;

    if (lxcContainerSendContinue(handshakefd) < 0) {
        virReportSystemError(errno, "%s",
                             _("error sending continue signal to parent"));
        goto cleanup;
    }
    VIR_FORCE_CLOSE(handshakefd);

lxcSetContainerResources() is what creates the cgroups. The very last
thing we do, once everything is configured and the 'init' process is
finally running, is to notify the handshake FD

 'lxcContainerSendContinue(handshakefd)'

which finally allows the libvirtd LXC code in 'lxcVmStart' to continue
running and return to the client. Only at that point is it possible for
other API calls to be made which touch the cgroups

I'm trying to reproduce the kind of race condition problem scenario you
describe using the following sequence of commands:

# virsh -c lxc:/// start helloworld ; virsh -c lxc:/// dominfo helloworld ; virsh -c  lxc:/// destroy helloworld
Domain helloworld started

Id:             7015
Name:           helloworld
UUID:           a099376e-a803-ca94-f99c-d9a8f9a30088
OS Type:        exe
State:          running
CPU(s):         1
CPU time:       0.0s
Max memory:     102400 kB
Used memory:    280 kB
Persistent:     yes
Autostart:      disable
Managed save:   unknown

Domain helloworld destroyed

Even if I add a 'sleep(10)' statement as the first line of the
lxcSetContainerResources() method, I can't seem to trigger any race
wrt cgroup creation & virsh dominfo.

Is there a better test case you can show me to reproduce what you're
seeing ?

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list