Hello Adrian, Christian, On Tue, 17 Dec 2019 at 16:05, Adrian Reber <areber@xxxxxxxxxx> wrote: > > Signed-off-by: Adrian Reber <areber@xxxxxxxxxx> > --- > v2: applied changes from review (Michael and Christian) > > v3: added explanation about needing a PID 1 in a PID namespace > --- > man2/clone.2 | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 99 insertions(+) Thanks, Adrian. Patch applied (and a few tweaks added; see [1]). Christian, thanks of the review. Cheers, Michael [1] https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=ee8bb310d8d16792723b9f69fcb9c6797cb07e79 > diff --git a/man2/clone.2 b/man2/clone.2 > index 076b9258e..15a1b56f6 100644 > --- a/man2/clone.2 > +++ b/man2/clone.2 > @@ -195,6 +195,8 @@ struct clone_args { > u64 stack; /* Pointer to lowest byte of stack */ > u64 stack_size; /* Size of stack */ > u64 tls; /* Location of new TLS */ > + u64 set_tid; /* Pointer to a \fIpid_t\fP array */ > + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */ > }; > .EE > .in > @@ -262,6 +264,8 @@ flags & 0xff exit_signal > stack stack > \fP---\fP stack_size > tls tls See CLONE_SETTLS > +\fP---\fP set_tid See below for details > +\fP---\fP set_tid_size > .TE > .RE > .\" > @@ -285,6 +289,80 @@ options when waiting for the child with > If no signal (i.e., zero) is specified, then the parent process is not signaled > when the child terminates. > .\" > +.SS The set_tid array > +.PP > +By default, the kernel chooses the next sequential PID for the new > +process in each of the PID namespaces where it is present. > +When creating a process with > +.BR clone3 (), > +the > +.I set_tid > +array can be used to select specific PIDs for the process in some > +or all of the PID namespaces where it is present. > +If the PID of the newly created process should only be set for the current > +PID namespace or in the newly created PID namespace (if > +.I flags > +contains > +.BR CLONE_NEWPID ) > +then the first element in the > +.I set_tid > +array has to be the desired PID and > +.I set_tid_size > +needs to be 1. > +.PP > +If the PID of the newly created process should have a certain value in > +multiple PID namespaces the > +.I set_tid > +array can have multiple entries. The first entry defines the PID in the most > +deeply nested PID namespace and all following entries contain the PID of the > +corresponding parent PID namespace. The number of PID namespaces in which a PID > +should be set is defined by > +.I set_tid_size > +which cannot be larger than the number of currently nested PID namespaces. > +.PP > +To create a process with the following PIDs in a PID namespace hierarchy: > +.RS > +.TS > +lb lb > +l l . > +PID NS level Requested PID > +0 (host) 31496 > +1 42 > +2 7 > +.TE > +.RE > +.PP > +Set the array to: > +.PP > +.EX > + set_tid[0] = 7; > + set_tid[1] = 42; > + set_tid[2] = 31496; > + set_tid_size = 3; > +.EE > +.PP > +If only the PIDs in the two innermost PID namespaces > +need to be specified, set the array to: > +.PP > +.EX > + set_tid[0] = 7; > + set_tid[1] = 42; > + set_tid_size = 2; > +.EE > +.PP > +The PID in the PID namespaces outside the two innermost PID namespaces > +will be selected the same way as any other PID is selected. > +.PP > +The > +.I set_tid > +feature requires > +.RB CAP_SYS_ADMIN > +in all owning user namespaces of the target PID namespaces. > +.PP > +Callers may only choose a PID > 1 in a given PID namespace if an init > +process (i.e. a process with PID 1) already exists. Otherwise the PID > +entry for this PID namespace must be 1. > +.\" > .SS The flags mask > .PP > Both > @@ -1201,6 +1279,11 @@ will be set appropriately. > Too many processes are already running; see > .BR fork (2). > .TP > +.BR EEXIST " (" clone3 "() only)" > +One or more of the PIDs specified in > +.I set_tid > +already exists in the corresponding PID namespace. > +.TP > .B EINVAL > .B CLONE_SIGHAND > was specified in the > @@ -1379,6 +1462,15 @@ in the > .I flags > mask. > .TP > +.BR EINVAL " (" clone3 "() only)" > +.I set_tid_size > +larger than current number of nested PID namespaces. > +.TP > +.BR EINVAL " (" clone3 "() only)" > +If one of the PIDs specified in > +.I set_tid > +was an invalid PID. > +.TP > .B ENOMEM > Cannot allocate sufficient memory to allocate a task structure for the > child, or to copy those parts of the caller's context that need to be > @@ -1450,6 +1542,13 @@ mask and the caller is in a chroot environment > (i.e., the caller's root directory does not match the root directory > of the mount namespace in which it resides). > .TP > +.BR EPERM " (" clone3 "() only)" > +.I set_tid_size > +was greater than zero, and the caller lacks the > +.B CAP_SYS_ADMIN > +capability in one or more of the user namespaces that own the > +corresponding PID namespaces. > +.TP > .BR ERESTARTNOINTR " (since Linux 2.6.17)" > .\" commit 4a2c7a7837da1b91468e50426066d988050e4d56 > System call was interrupted by a signal and will be restarted. > > base-commit: 5373f62f1e4352e665c24dfe49b7e3fe03721cab > -- > 2.23.0 > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/