"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > On Wed, Mar 6, 2013 at 1:40 AM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >> >>> On Tue, Mar 5, 2013 at 7:41 AM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >>>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >>>> >>>>> Eric, >>>>> >>>>> On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman >>>>> <ebiederm@xxxxxxxxxxxx> wrote: >>>>>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >>>>>> >>>>>>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman >>>>> <ebiederm@xxxxxxxxxxxx> wrote: >>>>>>>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >>>>>>>> >>>>>>>>> Hi Rob, >>>>>>>>> >>>>>>>>> On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley <rob@xxxxxxxxxxx> >>>>> wrote: >>>>>>>>>> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote: >>>>> [...] >>>>>>>>>>> Because the above unshare(2) and setns(2) calls only change the >>>>>>>>>>> PID namespace for created children, the clone(2) calls neces‐ >>>>>>>>>>> sarily put the new thread in a different PID namespace from the >>>>>>>>>>> calling thread. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Um, no they don't. They fail. That's the point. >>>>>>>>> >>>>>>>>> (Good catch.) >>>>>>>>> >>>>>>>>>> They _would_ put the new >>>>>>>>>> thread in a different PID namespace, which breaks the definition >>>>> of threads. >>>>>>>>>> >>>>>>>>>> How about: >>>>>>>>>> >>>>>>>>>> The above unshare(2) and setns(2) calls change the PID namespace >>>>> of >>>>>>>>>> children created by subsequent clone(2) calls, which is >>>>> incompatible >>>>>>>>>> with CLONE_VM. >>>>>>>>> >>>>>>>>> I decided on: >>>>>>>>> >>>>>>>>> The point here is that unshare(2) and setns(2) change the PID >>>>>>>>> namespace for created children but not for the calling process, >>>>>>>>> while clone(2) CLONE_VM specifies the creation of a new thread >>>>>>>>> in the same process. >>>>>>>> >>>>>>>> Can we make that "for all new tasks created" instead of "created >>>>>>>> children" >>>>>>>> >>>>>>>> Othewise someone might expect CLONE_THREAD would work as you >>>>>>>> CLONE_THREAD creates a thread and not a child... >>>>>>> >>>>>>> The term "task" is kernel-space talk that rarely appears in man >>>>> pages, >>>>>>> so I am reluctant to use it. >>>>>> >>>>>> With respect to clone and in this case I am not certain we can >>>>> properly >>>>>> describe what happens without talking about tasks. But it is worth >>>>>> a try. >>>>>> >>>>>> >>>>>>> How about this: >>>>>>> >>>>>>> The point here is that unshare(2) and setns(2) change the PID >>>>>>> namespace for processes subsequently created by the caller, but >>>>>>> not for the calling process, while clone(2) CLONE_VM specifies >>>>>>> the creation of a new thread in the same process. >>>>>> >>>>>> Hmm. How about this. >>>>>> >>>>>> The point here is that unshare(2) and setns(2) change the PID >>>>>> namespace that will be used by in all subsequent calls to clone >>>>>> and fork by the caller, but not for the calling process, and >>>>>> that all threads in a process must share the same PID >>>>>> namespace. Which makes a subsequent clone(2) CLONE_VM >>>>>> specify the creation of a new thread in the a different PID >>>>>> namespace but in the same process which is impossible. >>>>> >>>>> I did a little tidying: >>>>> >>>>> The point here is that unshare(2) and setns(2) change the >>>>> PID namespace that will be used in all subsequent calls >>>>> to clone(2) and fork(2), but do not change the PID names‐ >>>>> pace of the calling process. Because a subsequent >>>>> clone(2) CLONE_VM would imply the creation of a new >>>>> thread in a different PID namespace, the operation is not >>>>> permitted. >>>>> >>>>> Okay? >>>> >>>> That seems reasonable. >>>> >>>> CLONE_THREAD might be better to talk about. The check is CLONE_VM >>>> because it is easier and CLONE_THREAD implies CLONE_THREAD. >>>> >>>>> Having asked that, I realize that I'm still not quite comfortable with >>>>> this text. I think the problem is really one of terminology. At the >>>>> start of this passage in the page, there is the sentence: >>>>> >>>>> Every thread in a process must be in the >>>>> same PID namespace. >>>>> >>>>> Can you define "thread" in this context? >>>> >>>> Most definitely a thread group created with CLONE_THREAD. It is pretty >>>> ugly in just the old fashioned CLONE_VM case too, but that might be >>>> legal. >>>> >>>> In a few cases I think the implementation overshoots and test for VM >>>> sharing instead of thread group membership because VM sharing is easier >>>> to test for, and we already have tests for that. >>> >>> So, in summary, the point is that CLONE_VM is being used as a proxy >>> for CLONE_THREAD because the former is easier to test for, and >>> CLONE_THREAD requires CLONE_VM, right? >> >> I am totally lost about what we are problem we are trying to resolve in >> the text at this point. So I am taking this opportunity to review >> what is actually happening and hopefully give a clear and useful >> explanation. > > The problem is that the existing text talks about multithreaded > processes needing to be in the same PID namespace and then jumps to > talking about restrictions with CLONE_VM (not CLONE_THREAD). The > reader may not realize know that CLONE_VM is a near synonym for > "multithreaded process". > > However, the text you provide here is wonderful detail: > >> The clone flags have some dependencies. >> CLONE_SIGHAND requires CLONE_VM. >> CLONE_THREAD requires CLONE_SIGHAND. >> >> Ultimately there are cases in here that are too strange to think about, >> and that no one cares (except so far to document what is going on). The >> fundamental goal of these checks it to just not allow the cases that >> are too strange to think about. >> >> From a technical point of view CLONE_THREAD requires being in the same >> PID namespace so you can send signals to other threads in your process, >> and you need to see in proc all of the threads of your process. >> >> From a technical point of view CLONE_SIGHAND requries being in the same >> PID namespace because we need to know how to encode the PID of the >> sending process at the time a signal is enqueued in the destination >> queue. A signal queue shared by processes in multiple PID namespaces >> will defeat that. >> >> From a technical point of view CLONE_VM requires all of the threads to >> be in a PID namespace, because from the point of view of coredump code >> if two processes share the same address space they are threads and will >> be core dumped together. When a coredump is written the pid of each >> thread is written into the coredump. Writing the pids could not >> meaningfully succeed if some of the pids were in a parent PID namespace. >> >> Therefore there is a technical requirement for each of CLONE_THREAD, >> CLONE_SIGHAND, CLONE_VM to share a PID namespace. >> >> In the code in the kernel testing only for CLONE_VM is a shorthand for >> testing for any of CLONE_THREAD, CLONE_SIGHAND, or CLONE_VM. > > I will incorporate most of the above into the page. > >> On the flip side the addition by unshare(CLONE_NEWPID) of >> unshare(CLONE_THREAD) actually appears to be bogus > > I agree that it seems strange. Having looked at it a little more I will be removing the unnecessary CLONE_THREAD check in 3.10. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers