On Thu, Nov 28, 2019 at 06:24:05PM +0100, Christian Brauner wrote: > On Thu, Nov 28, 2019 at 01:46:50PM +0100, Adrian Reber wrote: > > Signed-off-by: Adrian Reber <areber@xxxxxxxxxx> > > --- > > man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 90 insertions(+) > > > > diff --git a/man2/clone.2 b/man2/clone.2 > > index 076b9258e..59c13ec35 100644 > > --- a/man2/clone.2 > > +++ b/man2/clone.2 > > @@ -195,6 +195,8 @@ struct clone_args { > > u64 stack; /* Pointer to lowest byte of stack */ > > u64 stack_size; /* Size of stack */ > > u64 tls; /* Location of new TLS */ > > + u64 set_tid; /* Pointer to a \fIpid_t\fP array */ > > + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */ > > }; > > .EE > > .in > > @@ -262,6 +264,8 @@ flags & 0xff exit_signal > > stack stack > > \fP---\fP stack_size > > tls tls See CLONE_SETTLS > > +\fP---\fP set_tid See below for details > > +\fP---\fP set_tid_size > > .TE > > .RE > > .\" > > @@ -285,6 +289,74 @@ options when waiting for the child with > > If no signal (i.e., zero) is specified, then the parent process is not signaled > > when the child terminates. > > .\" > > +.SS The set_tid array > > +.PP > > +The > > +.I set_tid > > +array is used to select a certain PID for the process to be created by > > +.BR clone3 (). > > +If the PID of the newly created process should only be set for the current > > +PID namespace or in the newly created PID namespace (if > > +.I flags > > +contains > > +.BR CLONE_NEWPID ) > > +then the first element in the > > +.I set_tid > > +array has to be the desired PID and > > +.I set_tid_size > > +needs to be 1. > > +.PP > > +If the PID of the newly created process should have a certain value in > > +multiple PID namespaces the > > +.I set_tid > > +array can have multiple entries. The first entry defines the PID in the most > > +nested PID namespace and all following entries contain the PID of the > > +corresponding parent PID namespace. The number of PID namespaces in which a PID > > +should be set is defined by > > +.I set_tid_size > > +which cannot be larger than the number of currently nested PID namespaces. > > "It's upper cap is the kernel-enforced general nesting limit." > or sm like that Is that an addition to my sentence or a replacement. I think at this point it is more important to point out that it cannot be larger than the number of currently nested PID namespaces. Later (at EPERM) I am also mentioning that it cannot be larger than the maximum number of nested PID namespaces. The code does indeed check if set_tid_size is larger than the maximum number of possible nested PID namespaces for the user, I think, when calling clone3(), it is more relevant that set_tid_size is not larger than the number of currently nested PID namespaces. The maximum number of possible nested PID namespaces is more likely enforced during unshare() or CLONE_NEWPID (which could be happening at the same point in time as set_tid_size larger than maximum number of nested PID namespace). This definitely feels like too much discussion for a single sentence ;) I can add a sentence about the maximum number of nested PID namespaces here in addition to the one at EPERM. I do not think it is relevant for the user at this point in time. Adrian