On Mon, Apr 17, 2017 at 8:36 PM, Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote: > On implementing of nested pid namespaces support in CRIU > (checkpoint-restore in userspace tool) we run into > the situation, that it's impossible to create a task with > specific NSpid effectively. After commit 49f4d8b93ccf > "pidns: Capture the user namespace and filter ns_last_pid" > it is impossible to set ns_last_pid on any pid namespace, > except task's active pid_ns (before the commit it was possible > to write to pid_ns_for_children). Thus, if a restored task > in a container has more than one pid_ns levels, the restorer > code must have a task helper for every pid namespace > of the task's pid_ns hierarhy. > > This is a big problem, because of communication with > a helper for every pid_ns in the hierarchy is not cheap > and not performance-good as it implies many helpers wakeups > to create a single task (independently, how you communicate > with the helpers). This patch tries to decide the problem. > > It introduces a new pid_ns ns_ioctl(PIDNS_REQ_SET_LAST_PID_VEC), > which allows to write a vector of last pids on pid_ns hierarchy. > The vector is passed as a ":"-delimited string with pids, > written in reverse order. The first number corresponds to > the opened namespace ns_last_pid, the second is to its parent, etc. > So, if you have the pid namespaces hierarchy like: > > pid_ns1 (grand father) > | > v > pid_ns2 (father) > | > v > pid_ns3 (child) > > and the ns of task's of pid_ns3 is open, then the corresponding > vector will be "last_ns_pid3:last_ns_pid2:last_ns_pid1". This > vector may be short and it may contain less levels, for example, > "last_ns_pid3:last_ns_pid2" or even "last_ns_pid3", in dependence > of which levels you want to populate. > > To write in a pid_ns's ns_last_pid we check that the writer task > has CAP_SYS_ADMIN permittions in this pid_ns's user_ns. > > One note about struct pidns_ioc_req. It's made extensible and > may expanded in the future. The always existing fields present > at the moment, the future fields and they sizes may be determined > by pidns_ioc_req::req by the future code. > > Signed-off-by: Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> Reviewed-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>