On Wed, Mar 6, 2013 at 1:40 AM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > >> On Tue, Mar 5, 2013 at 7:41 AM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >>> >>>> Eric, >>>> >>>> On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman >>>> <ebiederm@xxxxxxxxxxxx> wrote: >>>>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >>>>> >>>>>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman >>>> <ebiederm@xxxxxxxxxxxx> wrote: >>>>>>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >>>>>>> >>>>>>>> Hi Rob, >>>>>>>> >>>>>>>> On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley <rob@xxxxxxxxxxx> >>>> wrote: >>>>>>>>> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote: >>>> [...] >>>>>>>>>> Because the above unshare(2) and setns(2) calls only change the >>>>>>>>>> PID namespace for created children, the clone(2) calls neces‐ >>>>>>>>>> sarily put the new thread in a different PID namespace from the >>>>>>>>>> calling thread. >>>>>>>>> >>>>>>>>> >>>>>>>>> Um, no they don't. They fail. That's the point. >>>>>>>> >>>>>>>> (Good catch.) >>>>>>>> >>>>>>>>> They _would_ put the new >>>>>>>>> thread in a different PID namespace, which breaks the definition >>>> of threads. >>>>>>>>> >>>>>>>>> How about: >>>>>>>>> >>>>>>>>> The above unshare(2) and setns(2) calls change the PID namespace >>>> of >>>>>>>>> children created by subsequent clone(2) calls, which is >>>> incompatible >>>>>>>>> with CLONE_VM. >>>>>>>> >>>>>>>> I decided on: >>>>>>>> >>>>>>>> The point here is that unshare(2) and setns(2) change the PID >>>>>>>> namespace for created children but not for the calling process, >>>>>>>> while clone(2) CLONE_VM specifies the creation of a new thread >>>>>>>> in the same process. >>>>>>> >>>>>>> Can we make that "for all new tasks created" instead of "created >>>>>>> children" >>>>>>> >>>>>>> Othewise someone might expect CLONE_THREAD would work as you >>>>>>> CLONE_THREAD creates a thread and not a child... >>>>>> >>>>>> The term "task" is kernel-space talk that rarely appears in man >>>> pages, >>>>>> so I am reluctant to use it. >>>>> >>>>> With respect to clone and in this case I am not certain we can >>>> properly >>>>> describe what happens without talking about tasks. But it is worth >>>>> a try. >>>>> >>>>> >>>>>> How about this: >>>>>> >>>>>> The point here is that unshare(2) and setns(2) change the PID >>>>>> namespace for processes subsequently created by the caller, but >>>>>> not for the calling process, while clone(2) CLONE_VM specifies >>>>>> the creation of a new thread in the same process. >>>>> >>>>> Hmm. How about this. >>>>> >>>>> The point here is that unshare(2) and setns(2) change the PID >>>>> namespace that will be used by in all subsequent calls to clone >>>>> and fork by the caller, but not for the calling process, and >>>>> that all threads in a process must share the same PID >>>>> namespace. Which makes a subsequent clone(2) CLONE_VM >>>>> specify the creation of a new thread in the a different PID >>>>> namespace but in the same process which is impossible. >>>> >>>> I did a little tidying: >>>> >>>> The point here is that unshare(2) and setns(2) change the >>>> PID namespace that will be used in all subsequent calls >>>> to clone(2) and fork(2), but do not change the PID names‐ >>>> pace of the calling process. Because a subsequent >>>> clone(2) CLONE_VM would imply the creation of a new >>>> thread in a different PID namespace, the operation is not >>>> permitted. >>>> >>>> Okay? >>> >>> That seems reasonable. >>> >>> CLONE_THREAD might be better to talk about. The check is CLONE_VM >>> because it is easier and CLONE_THREAD implies CLONE_THREAD. >>> >>>> Having asked that, I realize that I'm still not quite comfortable with >>>> this text. I think the problem is really one of terminology. At the >>>> start of this passage in the page, there is the sentence: >>>> >>>> Every thread in a process must be in the >>>> same PID namespace. >>>> >>>> Can you define "thread" in this context? >>> >>> Most definitely a thread group created with CLONE_THREAD. It is pretty >>> ugly in just the old fashioned CLONE_VM case too, but that might be >>> legal. >>> >>> In a few cases I think the implementation overshoots and test for VM >>> sharing instead of thread group membership because VM sharing is easier >>> to test for, and we already have tests for that. >> >> So, in summary, the point is that CLONE_VM is being used as a proxy >> for CLONE_THREAD because the former is easier to test for, and >> CLONE_THREAD requires CLONE_VM, right? > > I am totally lost about what we are problem we are trying to resolve in > the text at this point. So I am taking this opportunity to review > what is actually happening and hopefully give a clear and useful > explanation. The problem is that the existing text talks about multithreaded processes needing to be in the same PID namespace and then jumps to talking about restrictions with CLONE_VM (not CLONE_THREAD). The reader may not realize know that CLONE_VM is a near synonym for "multithreaded process". However, the text you provide here is wonderful detail: > The clone flags have some dependencies. > CLONE_SIGHAND requires CLONE_VM. > CLONE_THREAD requires CLONE_SIGHAND. > > Ultimately there are cases in here that are too strange to think about, > and that no one cares (except so far to document what is going on). The > fundamental goal of these checks it to just not allow the cases that > are too strange to think about. > > From a technical point of view CLONE_THREAD requires being in the same > PID namespace so you can send signals to other threads in your process, > and you need to see in proc all of the threads of your process. > > From a technical point of view CLONE_SIGHAND requries being in the same > PID namespace because we need to know how to encode the PID of the > sending process at the time a signal is enqueued in the destination > queue. A signal queue shared by processes in multiple PID namespaces > will defeat that. > > From a technical point of view CLONE_VM requires all of the threads to > be in a PID namespace, because from the point of view of coredump code > if two processes share the same address space they are threads and will > be core dumped together. When a coredump is written the pid of each > thread is written into the coredump. Writing the pids could not > meaningfully succeed if some of the pids were in a parent PID namespace. > > Therefore there is a technical requirement for each of CLONE_THREAD, > CLONE_SIGHAND, CLONE_VM to share a PID namespace. > > In the code in the kernel testing only for CLONE_VM is a shorthand for > testing for any of CLONE_THREAD, CLONE_SIGHAND, or CLONE_VM. I will incorporate most of the above into the page. > On the flip side the addition by unshare(CLONE_NEWPID) of > unshare(CLONE_THREAD) actually appears to be bogus I agree that it seems strange. Cheers, Michael > because we do not > change the pid namespace of the process calling unshare (only it's > children), and we already allow that case with setns. I need to think > about that case a little more but I am going to queue up a patch for > 3.10 to make unshare(CLONE_NEWPID) and setns(CLONE_NEWPID) consistent. > Probably by removing the check in unshare(CLONE_NEWPID). > > I need to think about a bit about what happens from the threaded parents > perspective when different threads can create children in different PID > namespaces. Does it introduce weird hard to support cases into the code? > Or will it just work without requiring anything special and I can allow > it. > > Eric -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface"; http://man7.org/tlpi/ _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers