On Wed, Nov 13, 2013 at 02:53:05PM +0000, Daniel P. Berrange wrote: > On Fri, Nov 08, 2013 at 02:42:26PM -0500, Rich Felker wrote: > > On Fri, Nov 08, 2013 at 01:30:09PM +0800, Daniel P. Berrange wrote: > > > On Thu, Nov 07, 2013 at 09:15:43PM +0800, Gao feng wrote: > > > > I met a problem that container blocked by seteuid/setegid > > > > which is call in lxcContainerSetID on UP system and libvirt > > > > compiled with --with-fuse=yes. > > > > > > > > I looked into the glibc's codes, and found setxid in glibc > > > > calls futex() to wait for other threads to change their > > > > setxid_futex to 0(see setxid_mark_thread in glibc). > > > > > > > > since the process created by clone system call will not > > > > share the memory with the other threads and the context > > > > of memory doesn't changed until we call execl.(COW) > > > > > > > > So if the process which created by clone is called before > > > > fuse thread being stated, the new setxid_futex of fuse > > > > thread will not be saw in this process, it will be blocked > > > > forever. > > > > > > > > Maybe this problem should be fixed in glibc, but I send > > > > this patch as a quick fix. > > > > > > Can you show a stack trace of the threads/processes deadlocking > > > > I think this is a symptom of setxid not being async-signal-safe like > > it's required to be. I'm not sure if we have a bug tracker entry for > > that; if not, it should be added. But if clone() is being used except > > in a fork-like manner, this is probably invalid application usage too. > > We are not using clone() in a manner that is strictly equivalent > to fork(). Libvirt is using clone() to create Linux containers > with new namespaces. eg we do > > clone(CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWNET|SIGCHLD) Understood. I still call this a fork-like manner since it's not sharing VM or using CLONE_THREAD and using the default signal of SIGCHLD. BTW is there a reason to prefer this usage over regular fork followed by unshare()? > IIUC, if a process is multi-threaded you should restrict yourself to > use of async signal safe functions in between fork() and exec(). I > assume this restriction applies to clone() and exec() pairings too. > > Libvirt is in fact violating rules about only using async signal safe > functions between clone() and exec() in many places. So I think what > we need to do is avoid starting any threads in the parent until after > we've clone()'d to create the new child namespace. Per the specification, setuid is AS-safe. However glibc fails to meet this requirement (it's actually very hard to meet due to Linux limitations in how the kernel manages uids/gids). So for now, avoiding starting threads until after performing clone() is probably a better solution than trying to eliminate calls to non-AS-safe functions. Rich -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list