Re: For review: pid_namespaces(7) man page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 6, 2013 at 1:40 AM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>
>> On Tue, Mar 5, 2013 at 7:41 AM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>>>
>>>> Eric,
>>>>
>>>> On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman
>>>> <ebiederm@xxxxxxxxxxxx> wrote:
>>>>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>>>>>
>>>>>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman
>>>> <ebiederm@xxxxxxxxxxxx> wrote:
>>>>>>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>>>>>>>
>>>>>>>> Hi Rob,
>>>>>>>>
>>>>>>>> On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley <rob@xxxxxxxxxxx>
>>>> wrote:
>>>>>>>>> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
>>>> [...]
>>>>>>>>>> Because the above unshare(2) and setns(2) calls only change the
>>>>>>>>>> PID namespace for created children, the clone(2) calls neces‐
>>>>>>>>>> sarily put the new thread in a different PID namespace from the
>>>>>>>>>> calling thread.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Um, no they don't. They fail. That's the point.
>>>>>>>>
>>>>>>>> (Good catch.)
>>>>>>>>
>>>>>>>>> They _would_ put the new
>>>>>>>>> thread in a different PID namespace, which breaks the definition
>>>> of threads.
>>>>>>>>>
>>>>>>>>> How about:
>>>>>>>>>
>>>>>>>>> The above unshare(2) and setns(2) calls change the PID namespace
>>>> of
>>>>>>>>> children created by subsequent clone(2) calls, which is
>>>> incompatible
>>>>>>>>> with CLONE_VM.
>>>>>>>>
>>>>>>>> I decided on:
>>>>>>>>
>>>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>>>> namespace for created children but not for the calling process,
>>>>>>>> while clone(2) CLONE_VM specifies the creation of a new thread
>>>>>>>> in the same process.
>>>>>>>
>>>>>>> Can we make that "for all new tasks created" instead of "created
>>>>>>> children"
>>>>>>>
>>>>>>> Othewise someone might expect CLONE_THREAD would work as you
>>>>>>> CLONE_THREAD creates a thread and not a child...
>>>>>>
>>>>>> The term "task" is kernel-space talk that rarely appears in man
>>>> pages,
>>>>>> so I am reluctant to use it.
>>>>>
>>>>> With respect to clone and in this case I am not certain we can
>>>> properly
>>>>> describe what happens without talking about tasks. But it is worth
>>>>> a try.
>>>>>
>>>>>
>>>>>> How about this:
>>>>>>
>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>> namespace for processes subsequently created by the caller, but
>>>>>> not for the calling process, while clone(2) CLONE_VM specifies
>>>>>> the creation of a new thread in the same process.
>>>>>
>>>>> Hmm. How about this.
>>>>>
>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>> namespace that will be used by in all subsequent calls to clone
>>>>> and fork by the caller, but not for the calling process, and
>>>>> that all threads in a process must share the same PID
>>>>> namespace. Which makes a subsequent clone(2) CLONE_VM
>>>>> specify the creation of a new thread in the a different PID
>>>>> namespace but in the same process which is impossible.
>>>>
>>>> I did a little tidying:
>>>>
>>>> The point here is that unshare(2) and setns(2) change the
>>>> PID namespace that will be used in all subsequent calls
>>>> to clone(2) and fork(2), but do not change the PID names‐
>>>> pace of the calling process. Because a subsequent
>>>> clone(2) CLONE_VM would imply the creation of a new
>>>> thread in a different PID namespace, the operation is not
>>>> permitted.
>>>>
>>>> Okay?
>>>
>>> That seems reasonable.
>>>
>>> CLONE_THREAD might be better to talk about.  The check is CLONE_VM
>>> because it is easier and CLONE_THREAD implies CLONE_THREAD.
>>>
>>>> Having asked that, I realize that I'm still not quite comfortable with
>>>> this text. I think the problem is really one of terminology. At the
>>>> start of this passage in the page, there is the sentence:
>>>>
>>>> Every thread in a process must be in the
>>>> same PID namespace.
>>>>
>>>> Can you define "thread" in this context?
>>>
>>> Most definitely a thread group created with CLONE_THREAD.  It is pretty
>>> ugly in just the old fashioned CLONE_VM case too, but that might be
>>> legal.
>>>
>>> In a few cases I think the implementation overshoots and test for VM
>>> sharing instead of thread group membership because VM sharing is easier
>>> to test for, and we already have tests for that.
>>
>> So, in summary, the point is that CLONE_VM is being used as a proxy
>> for CLONE_THREAD because the former is easier to test for, and
>> CLONE_THREAD requires CLONE_VM, right?
>
> I am totally lost about what we are problem we are trying to resolve in
> the text at this point.  So I am taking this opportunity to review
> what is actually happening and hopefully give a clear and useful
> explanation.

The problem is that the existing text talks about multithreaded
processes needing to be in the same PID namespace and then jumps to
talking about restrictions with CLONE_VM (not CLONE_THREAD). The
reader may not realize know that CLONE_VM is a near synonym for
"multithreaded process".

However, the text you provide here is wonderful detail:

> The clone flags have some dependencies.
> CLONE_SIGHAND requires CLONE_VM.
> CLONE_THREAD requires CLONE_SIGHAND.
>
> Ultimately there are cases in here that are too strange to think about,
> and that no one cares (except so far to document what is going on).  The
> fundamental goal of these checks it to just not allow the cases that
> are too strange to think about.
>
> From a technical point of view CLONE_THREAD requires being in the same
> PID namespace so you can send signals to other threads in your process,
> and you need to see in proc all of the threads of your process.
>
> From a technical point of view CLONE_SIGHAND requries being in the same
> PID namespace because we need to know how to encode the PID of the
> sending process at the time a signal is enqueued in the destination
> queue.  A signal queue shared by processes in multiple PID namespaces
> will defeat that.
>
> From a technical point of view CLONE_VM requires all of the threads to
> be in a PID namespace, because from the point of view of coredump code
> if two processes share the same address space they are threads and will
> be core dumped together.  When a coredump is written the pid of each
> thread is written into the coredump.  Writing the pids could not
> meaningfully succeed if some of the pids were in a parent PID namespace.
>
> Therefore there is a technical requirement for each of CLONE_THREAD,
> CLONE_SIGHAND, CLONE_VM to share a PID namespace.
>
> In the code in the kernel testing only for CLONE_VM is a shorthand for
> testing for any of CLONE_THREAD, CLONE_SIGHAND, or CLONE_VM.

I will incorporate most of the above into the page.

> On the flip side the addition by unshare(CLONE_NEWPID) of
> unshare(CLONE_THREAD) actually appears to be bogus

I agree that it seems strange.

Cheers,

Michael

> because we do not
> change the pid namespace of the process calling unshare (only it's
> children), and we already allow that case with setns.  I need to think
> about that case a little more but I am going to queue up a patch for
> 3.10 to make unshare(CLONE_NEWPID) and setns(CLONE_NEWPID) consistent.
> Probably by removing the check in unshare(CLONE_NEWPID).
>
> I need to think about a bit about what happens from the threaded parents
> perspective when different threads can create children in different PID
> namespaces. Does it introduce weird hard to support cases into the code?
> Or will it just work without requiring anything special and I can allow
> it.
>
> Eric



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers



[Index of Archives]     [Cgroups]     [Netdev]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux