Hello Adrian, On 11/28/19 1:46 PM, Adrian Reber wrote:
Signed-off-by: Adrian Reber <areber@xxxxxxxxxx> --- man2/clone.2 | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) diff --git a/man2/clone.2 b/man2/clone.2 index 076b9258e..59c13ec35 100644 --- a/man2/clone.2 +++ b/man2/clone.2 @@ -195,6 +195,8 @@ struct clone_args { u64 stack; /* Pointer to lowest byte of stack */ u64 stack_size; /* Size of stack */ u64 tls; /* Location of new TLS */ + u64 set_tid; /* Pointer to a \fIpid_t\fP array */ + u64 set_tid_size; /* Number of elements in \fIset_tid\fP */ }; .EE .in @@ -262,6 +264,8 @@ flags & 0xff exit_signal stack stack \fP---\fP stack_size tls tls See CLONE_SETTLS +\fP---\fP set_tid See below for details +\fP---\fP set_tid_size .TE .RE .\" @@ -285,6 +289,74 @@ options when waiting for the child with If no signal (i.e., zero) is specified, then the parent process is not signaled when the child terminates. .\" +.SS The set_tid array +.PP +The +.I set_tid +array is used to select a certain PID for the process to be created by
s/is used/may be used/ Because it's not required to use this array, right? I mean, the default is that the kernel chooses the PIDs. Perhaps this needs to be more clearly stated at the start of this subsection. How about: [[ By default, the kernel chooses the next sequential PID for the new process in each of the PID namespaces where it is present. When creating a process with .BR clone3 (), the .I set_tid array can be used to select specific PIDs for the process in some or all of the PID namespaces where it is present. ]] ?
+.BR clone3 (). +If the PID of the newly created process should only be set for the current +PID namespace or in the newly created PID namespace (if +.I flags +contains +.BR CLONE_NEWPID ) +then the first element in the +.I set_tid +array has to be the desired PID and +.I set_tid_size +needs to be 1. +.PP +If the PID of the newly created process should have a certain value in +multiple PID namespaces the +.I set_tid +array can have multiple entries. The first entry defines the PID in the most
most *deeply* nested
+nested PID namespace and all following entries contain the PID of the +corresponding parent PID namespace. The number of PID namespaces in which a PID +should be set is defined by +.I set_tid_size +which cannot be larger than the number of currently nested PID namespaces. +.PP +To create a process with the following PIDs: +.RS +.TS +lb lb +l l . +PID NS level Requested PID +0 (host) 31496 +1 42 +2 7 +.TE +.RE +.PP +The +.I set_tid +array would need to be filled with: +.PP +.EX + set_tid[0] = 7; + set_tid[1] = 42; + set_tid[2] = 31496; + set_tid_size = 3; +.EE +.PP +If only the PID of the two innermost PID namespaces +should be defined it needs to be set like this: +.PP +.EX + set_tid[0] = 7; + set_tid[1] = 42; + set_tid_size = 2; +.EE +.PP +The PID in the PID namespaces outside the two innermost PID namespaces +is then selected the same way as any other PID is selected. +.PP +Only a privileged process +.RB ( CAP_SYS_ADMIN ) +can set +.I set_tid +to select a PID for the process to be created. +.\" .SS The flags mask .PP Both @@ -1379,6 +1451,16 @@ in the .I flags mask. .TP +.BR EINVAL " (" clone3 "() only)" +.I set_tid_size +larger than current number of nested PID namespaces or maximum number of +nested PID namespaces was specified. +.TP +.BR EINVAL " (" clone3 "() only)" +If one of the PIDs specified in +.I set_tid +was an invalid PID. +.TP .B ENOMEM Cannot allocate sufficient memory to allocate a task structure for the child, or to copy those parts of the caller's context that need to be @@ -1450,6 +1532,14 @@ mask and the caller is in a chroot environment (i.e., the caller's root directory does not match the root directory of the mount namespace in which it resides). .TP +.BR EPERM " (" clone3 "() only)" +If +.I set_tid +with +.I set_tid_size +larger than 0 was specified by an unprivileged process (process without +\fBCAP_SYS_ADMIN\fP). +.TP .BR ERESTARTNOINTR " (since Linux 2.6.17)" .\" commit 4a2c7a7837da1b91468e50426066d988050e4d56 System call was interrupted by a signal and will be restarted.
Thanks, Michael