Re: [PATCH] setpgid.2, exit.3: document the lack of POSIX-specified behaviour inside PID NS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 12, 2019 at 04:17:32PM -0500, Eric W. Biederman wrote:
> "Dmitry V. Levin" <ldv@xxxxxxxxxxxx> writes:
> > On Sun, Mar 10, 2019 at 11:59:26AM -0500, Eric W. Biederman wrote:
> >> "Dmitry V. Levin" <ldv@xxxxxxxxxxxx> writes:
> >> > On Thu, Mar 07, 2019 at 09:02:07PM +0100, Eugene Syromyatnikov wrote:
> >> >> On Thu, Mar 7, 2019 at 8:05 PM Eugene Syromyatnikov <evgsyr@xxxxxxxxx> wrote:
> >> >> >
> >> >> > The POSIX-mandated behaviour of sending SIGCONT/SIGHUP to stopped processes
> >> >> > of an orphaned process group is not observed inside PID namespaces, as
> >> >> > can be verified by running [1] inside a PID namespace, for example.
> >> >> >
> >> >> > The derivation is (presumably) introduced by Linux commit
> >> >> s/derivation/deviation/
> >> >> 
> >> >> > v2.6.24-rc1~237 ("pid namespaces: define is_global_init() and
> >> >> > is_container_init()").
> >> >> >
> >> >> > [1] https://gitlab.com/strace/strace/commit/4278e6613f48273e7da0989712f1c18aaffefd84
> >> >> >
> >> >> > Reported-by: Dmitry V. Levin <ldv@xxxxxxxxxxxx>
> >> >> > Signed-off-by: Eugene Syromyatnikov <evgsyr@xxxxxxxxx>
> >> >> 
> >> >> It should probably also be noted that the behaviour is also described
> >> >> in TLPI, Section 34.8 ("Process groups, sessions, and job control:
> >> >> Summary"), so it also likely has to be updated.
> >> >
> >> > Strictly speaking, whether orphaned process group semantics works in
> >> > a PID namespace or not depends on the session ID.  If the session ID is
> >> > the same as the session ID of init (which happens quite often in case
> >> > of a PID namespace), then orphaned process group semantics doesn't work.
> >> > If they differ, then the POSIX-mandated behaviour is supported.
> >> 
> >> 
> >> http://pubs.opengroup.org/onlinepubs/9699919799 says:
> >> > 3.264:  Orphaned Process Group
> >> > 
> >> > A process group in which the parent of every member is either itself
> >> > a member of the process group or is not a member of the group's session.
> >> 
> >> It does not say anything about init.
> >
> > No, it doesn't say anything about init.
> >
> > I'm saying that the current linux behaviour is not conforming because
> > the POSIX-mandated orphaned process group semantics is not implemented
> > for the case when the session ID is the same as the session ID of init
> > in the PID namespace.
> 
> The description below sounds like a real problem that breaks existing
> software.
> 
> I don't see how I can say that code working exactly as specified by
> POSIX is non-conforming.  I will definitely say that there is an issue.
> 
> >> Which makes the current version of orphaned process group handling posix
> >> conformant.  By not ignoring the pid namespace init the code may not be
> >> backwards compatible with the rest of linux.    Which may be a problem
> >> worth addressing, either in the documentation or in the code.
> >> 
> >> It is not a break from posix.
> >> 
> >> Where is this behavior a problem?
> >
> > It is a problem in GitLab CI and whoever else uses docker-like
> > containerization in a simple way.
> >
> > One of our complex strace tests passed everywhere except GitLab CI where
> > it failed.  We were at a loss to find out why until we suspected the
> > kernel and wrote a simple test for the orphaned process group semantics.
> > When that simple test passed everywhere except GitLab CI where it failed,
> > we suspected PID namespaces and reproduced the failure using "unshare
> > -fp".
> 
> Thank you.  That description helps a lot.
> 
> At a minimum it sounds like we should document this case as a potential
> problem and fix docker to not do that.
> 
> I am open to changes of behavior in the kernel but I want to make
> certain they are well justified before I make anything so if possible
> other regressions and complications are not introduced.

I see your point.  Unfortunately, a simple replacement of is_global_init()
with is_container_init() in will_become_orphaned_pgrp() can indeed change
the behaviour when the pid namespace init is the natural parent of
processes in the process group specified to will_become_orphaned_pgrp().

> The intended semantics are that sessions and process groups can span
> pid namespaces.  So I need to wrap my head around what makes what
> happens in a pid namespace special that causes problems.  Is it the
> reparenting to the pid namespace init?  Or do we just have a case where
> the session is set up in a funny way the process group looks orphaned
> from inside the process group but it does not actually act orphaned.

It is the reparenting to the pid namespace init.

In the test, termination of the process group leader leads to reparenting
of its children to the pid namespace init.  When that init is also the
global init, will_become_orphaned_pgrp() skips the session check for
reparented processes.

> Hmm.  It looks like you have answered my questions with your test
> program orphaned_process_group.  Is there source anywhere handy that I
> can read it?

Sure, it's a part of strace testsuite:
https://gitlab.com/strace/strace/blob/master/tests/orphaned_process_group.c
https://github.com/strace/strace/blob/master/tests/orphaned_process_group.c

> >> > For example:
> >> >
> >> > $ unshare -fprU sh -c './orphaned_process_group >/dev/null' && echo good || echo bad
> >> > Orphaned process group semantics is not supported by the kernel
> >> > bad
> >> > $ unshare -fprU sh -c 'setsid ./orphaned_process_group >/dev/null' && echo good || echo bad
> >> > good
> >> >
> >> > What can I say?  The very least that could be done to fix this is
> >> > to replace is_global_init() invocation with is_container_init()
> >> > in will_become_orphaned_pgrp() as suggested in
> >> > https://lkml.org/lkml/2007/12/8/208
> 
> Thank you very much,
> Eric Biederman

-- 
ldv

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux