On Fri, Sep 30, 2011 at 11:00:53AM -0500, Serge Hallyn wrote: > Quoting Daniel P. Berrange (berrange@xxxxxxxxxx): > > On Thu, Sep 29, 2011 at 10:12:17PM -0500, Serge E. Hallyn wrote: > > > Quoting Daniel P. Berrange (berrange@xxxxxxxxxx): > > > > On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote: > > > > > Nova (openstack) calls libvirt to create a container, then > > > > > periodically checks using GetInfo to see whether the container > > > > > is up. If it does this too quickly, then libvirt returns an > > > > > error, which in libvirt.py causes an exception to be raised, > > > > > the same type as if the container was bad. > > > > lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of > > > > its execution. It checks for virDomainObjIsActive() before > > > > trying to use the cgroups. > > > > > > Yes, it does, but > > > > > > > lxcDomainStart(), holds the mutex on 'dom' for the duration of > > > > its execution, and does not return until the container is running > > > > and cgroups are present. > > > > > > No. It calls the lxc_controller with --background. The controller > > > main task in turn exits before the cgroups have been set up. There > > > is the race. > > > > The lxcDomainStart() method isn't actually waiting on the child > > pid directly, so the --background flag ought not to matter. We > > have a pipe that we pass into the controller, which we wait on > > for a notification after running the process. The controller > > does not notify the 'handshake' FD until after cgroups have > > been setup, unless I'm mis-interpreting our code > > That's the call to lxcContainerWaitForContinue(), right? If so, that's > done by lxcContainerChild(), which is called by the lxc_controller. > AFAICS there is nothing in the lxc_driver which will wait on that > before dropping the driver->lock mutex. In lxcVmStart(), which runs while driver->lock is held we have the following section of code in play: .... if (virCommandRun(cmd, NULL) < 0) goto cleanup; if (VIR_CLOSE(handshakefds[1]) < 0) { virReportSystemError(errno, "%s", _("could not close handshake fd")); goto cleanup; } /* Connect to the controller as a client *first* because * this will block until the child has written their * pid file out to disk */ if ((priv->monitor = lxcMonitorClient(driver, vm)) < 0) goto cleanup; /* And get its pid */ if ((r = virPidFileRead(driver->stateDir, vm->def->name, &vm->pid)) < 0) { virReportSystemError(-r, _("Failed to read pid file %s/%s.pid"), driver->stateDir, vm->def->name); goto cleanup; } vm->def->id = vm->pid; virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, reason); if (lxcContainerWaitForContinue(handshakefds[0]) < 0) { .... The 'virCommandRun' is where libvirt_lxc controller is forked (in the background). The main libvirt LXC driver code then blocks on this 'lxcContainerWaitForContinue(handshakefds[0]) line, for the controller to finish initializing. The LXC controller 'main' method received the handshake FD and invokes lxcControllerRun(). This method does various setup tasks, in particular the following: .... if (lxcSetContainerResources(def) < 0) goto cleanup; ... if ((container = lxcContainerStart(def, nveths, veths, control[1], containerhandshake[1], containerPtyPath)) < 0) goto cleanup; VIR_FORCE_CLOSE(control[1]); VIR_FORCE_CLOSE(containerhandshake[1]); if (lxcControllerMoveInterfaces(nveths, veths, container) < 0) goto cleanup; if (lxcContainerSendContinue(control[0]) < 0) { virReportSystemError(errno, "%s", _("Unable to send container continue message")); goto cleanup; } if (lxcContainerWaitForContinue(containerhandshake[0]) < 0) { virReportSystemError(errno, "%s", _("error receiving signal from container")); goto cleanup; } /* Now the container is fully setup... */ .... /* ...and reduce our privileges */ if (lxcControllerClearCapabilities() < 0) goto cleanup; if (lxcContainerSendContinue(handshakefd) < 0) { virReportSystemError(errno, "%s", _("error sending continue signal to parent")); goto cleanup; } VIR_FORCE_CLOSE(handshakefd); lxcSetContainerResources() is what creates the cgroups. The very last thing we do, once everything is configured and the 'init' process is finally running, is to notify the handshake FD 'lxcContainerSendContinue(handshakefd)' which finally allows the libvirtd LXC code in 'lxcVmStart' to continue running and return to the client. Only at that point is it possible for other API calls to be made which touch the cgroups I'm trying to reproduce the kind of race condition problem scenario you describe using the following sequence of commands: # virsh -c lxc:/// start helloworld ; virsh -c lxc:/// dominfo helloworld ; virsh -c lxc:/// destroy helloworld Domain helloworld started Id: 7015 Name: helloworld UUID: a099376e-a803-ca94-f99c-d9a8f9a30088 OS Type: exe State: running CPU(s): 1 CPU time: 0.0s Max memory: 102400 kB Used memory: 280 kB Persistent: yes Autostart: disable Managed save: unknown Domain helloworld destroyed Even if I add a 'sleep(10)' statement as the first line of the lxcSetContainerResources() method, I can't seem to trigger any race wrt cgroup creation & virsh dominfo. Is there a better test case you can show me to reproduce what you're seeing ? Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list