Serge E. Hallyn wrote: > Quoting Sukadev Bhattiprolu (sukadev@xxxxxxxxxxxxxxxxxx): >> Subject: [RFC][v4][PATCH 7/7]: Define clone_extended() syscall >> >> Container restart requires that a task have the same pid it had when it was >> checkpointed. When containers are nested the tasks within the containers >> exist in multiple pid namespaces and hence have multiple pids to specify >> during restart. >> >> This patch defines, a new system call, clone_extended() which is like clone(), >> but takes a new 'pid_set' parameter. This parameter lets caller choose >> specific pid numbers for the child process, in the process's active and >> ancestor pid namespaces. (Descendant pid namespaces in general don't matter >> since processes don't have pids in them anyway, but see comments in >> copy_target_pids() regarding CLONE_NEWPID). >> >> Unlike clone(), however, clone_extended() needs CAP_SYS_ADMIN, at least for >> now, to prevent unprivileged processes from misusing this interface. > > It only needs that when specifying pids. > >> While the main motivation for this interface is the need to let a process >> choose its 'pid numbers', the clone_extended() interface uses 64-bit clone >> flags. The 'higher' portion of the clone flags are unused and are only >> included to preclude yet another version of clone when a new clone flag is >> needed. >> >> ===== Interface: >> >> Compared to clone(), clone_extended() needs to pass in three more pieces >> of information: >> >> - additional 32-bit of clone_flags >> - number of pids in the set >> - user buffer containing the list of pids. >> >> But since clone() already takes 5 parameters and some (all ?) architectures >> are restricted to 6 parameters to a system-call, additional data-structures >> (and copy_from_user()) are needed. >> >> The proposed interface for clone_extended() is: >> >> struct clone_tid_info { >> void *parent_tid; /* parent_tid_ptr parameter */ >> void *child_tid; /* child_tid_ptr parameter */ >> }; >> >> struct pid_set { >> int num_pids; >> pid_t *pids; >> }; >> >> int clone_extended(int flags_low, int flags_high, void *child_stack, >> void *unused, struct clone_tid_info *tid_ptrs, >> struct pid_set *pid_setp); > > I was thinking additional flags would be passed in the (renamed) > struct pid_set. Yes. But maybe in (renamed) 'struct clone_info' instead of 'struct pid_set' ? I vaguely recall a strong preference to not require copy-from-user during a fast-path clone, because it may hurt performance. *If* this is the case, then maybe place extra flags among the "base" args, or at least a CLONE_EXTRA would indicate that more arguments need to be pulled from user-space ? Do you intend to get feedback from LKML too ? Oren. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers