Eric W. Biederman [ebiederm@xxxxxxxxxxxx] wrote: | Sukadev Bhattiprolu <sukadev@xxxxxxxxxxxxxxxxxx> writes: | | > Prevent container-inits from using CLONE_PARENT | > | > If a container-init creates a sibling (using CLONE_PARENT), pid namespace | > semantics become complicated: | > | > - the "active pid namespace" of the sibling will be the descendant | > container, but its not obvious if that is correct. | | It is correct the sibling must not change pid namespaces. You are not | allowed to escape out of a pid namespace. | | > - if container-init exits, it will terminate the sibling, but again | > its not clear if that is the correct behavior. | | Again correct because the container-init is the child reaper for the pid namespace. | No reaper no namespace. | | > - the sibling exists in both parent and child containers while current | > pid namespace semantics assume that only container-init can exist | > in both parent/child containers. | | All tasks in the container also exist in the parent container. | What assumption are you talking about? You are right, thats not really different for CLONE_PARENT. | | > - the parent of the sibling is not a descendant of container-init | > (while pid namespaces assume that all processes in the container | > are descendants of the container-init) | | User space assumes that certainly. What part of the pid namespace | code makes such an assumption? I was referring only to user-space view. | | > - When the sibling dies, the SIGCHLD is sent to its parent (if | > alive), i.e the signal escapes the container to a parent container. | > (if the parent of the sibling exits, the container-init then becomes | > the reaper of the sibling). | | Yes. | | > To keep pid namespace semantics simple, prevent container-inits from using | > CLONE_PARENT at least until we have a better understanding of CLONE_PARENT | > and pid-namespace interactions. | | The only argument that I can see that carries any weight is that unix | semantics fundamentally assume a process tree. Allowing init to use | CLONE_PARENT creates a multi-rooted process tree. Right. | | At which point the is_global_init check is foolish. Well, I was trying to disable CLONE_PARENT just with pid namespaces, Disabling CLONE_PARENT for global init seemed independent of namespaces and there was recent talk of potential users of CLONE_PARENT so I am not sure if there is an init that uses the old threading model ! I don't have convincing reason besides "lets enable when uses/semanitcs for CLONE_PARENT with pid namespaces are clear". | | Eric | | | > Untested, RFC patch :-) | > | > Signed-off-by: Sukadev Bhattiprolu <sukadev@xxxxxxxxxx> | > --- | > kernel/fork.c | 8 ++++++++ | > 1 file changed, 8 insertions(+) | > | > Index: linux-mmotm/kernel/fork.c | > =================================================================== | > --- linux-mmotm.orig/kernel/fork.c 2009-06-17 18:23:23.000000000 -0700 | > +++ linux-mmotm/kernel/fork.c 2009-06-17 19:17:54.000000000 -0700 | > @@ -974,6 +974,14 @@ static struct task_struct *copy_process( | > if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM)) | > return ERR_PTR(-EINVAL); | > | > + /* | > + * To keep pid namespace semantics simple, prevent container-inits | > + * from creating siblings. | > + */ | > + if ((clone_flags & CLONE_PARENT) && | > + is_container_init(current) && !is_global_init(current)) | > + return ERR_PTR(-EINVAL); | > + | > retval = security_task_create(clone_flags); | > if (retval) | > goto fork_out; _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers