DB> +/** DB> + * lxcControllerMoveInterfaces DB> + * @nveths: number of interfaces DB> + * @veths: interface names DB> + * @container: pid of container DB> + * DB> + * Moves network interfaces into a container's namespace DB> + * DB> + * Returns 0 on success or -1 in case of error DB> + */ DB> +static int lxcControllerMoveInterfaces(int nveths, DB> + char **veths, DB> + pid_t container) DB> +{ DB> + int i; DB> + for (i = 0 ; i < nveths ; i++) DB> + if (moveInterfaceToNetNs(veths[i], container) < 0) { DB> + lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR, DB> + _("failed to move interface %s to ns %d"), DB> + veths[i], container); DB> + return -1; DB> + } DB> + DB> + return 0; DB> +} I'm not sure why, but the call to this function causes a failure on my system. I think that it's related to luck somehow, so I don't think it's something that was really introduced by this set, but it prevents me from starting a container with a network interface. I've tracked it down to the actual virRun() of the 'ip' command. If I comment out the line that virRun()'s ip, everything works fine. However, if it actually gets run, the container never receives the continue message. I added a loop in lxcWaitForContinue() that reads the socket character-by-character instead of bailing out if the first character isn't 'c'. The result was a bunch of control characters followed by 'c'. Based on these two facts, I would tend to guess that somehow the exec of ip is causing some terminal stuff (yes, that's my technical term for it) to be written to the socket ahead of the continue message. I changed 'ip' to 'true' in moveInterfaceToNetNs() and it behaves the same (which absolves ip itself). I removed the virRun() and replaced with a fork()..exec() and it behaves the same. I set FD_CLOEXEC on the socket pair, and it behaves the same. I have no idea (or rather, I'm out of ideas) why this is happening. I assume you're not seeing it because you're not playing with network-enabled guests. DB> +int lxcControllerStart(lxc_vm_def_t *def, DB> + int nveths, DB> + char **veths, DB> + int monitor, DB> + int appPty, DB> + const char *logfile) DB> +{ <snip> DB> + if ((logfd = open(logfile, O_WRONLY | O_TRUNC)) < 0) You need O_CREAT here. DB> @@ -590,82 +578,75 @@ <snip> DB> - if (0 != (rc = lxcSetupInterfaces(conn, vm))) { You changed the condition from "nonzero" here... <snip> DB> + if (lxcSetupInterfaces(conn, vm->def, &nveths, &veths) < 0) ...to "negative" here, which makes the start process not notice some of the failure paths in that function, and thus will erroneously return success. In general, I think this is an excellent approach, but I don't really like how many (dozens of) failure points there are in the controller and container setup procedure that result in a silent failure. Almost all of them leave libvirtd thinking that the container/controller is running, but in fact, it _exit(1)'d long ago. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@xxxxxxxxxx
Attachment:
pgpQQZ3uqKgNB.pgp
Description: PGP signature
-- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list