On 27.04.2017 19:39, Eric W. Biederman wrote: > Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes: > >> On 27.04.2017 19:12, Oleg Nesterov wrote: >>> On 04/26, Kirill Tkhai wrote: >>>> >>>> On 26.04.2017 18:53, Oleg Nesterov wrote: >>>>> >>>>>> +static long set_last_pid_vec(struct pid_namespace *pid_ns, >>>>>> + struct pidns_ioc_req *req) >>>>>> +{ >>>>>> + char *str, *p; >>>>>> + int ret = 0; >>>>>> + pid_t pid; >>>>>> + >>>>>> + read_lock(&tasklist_lock); >>>>>> + if (!pid_ns->child_reaper) >>>>>> + ret = -EINVAL; >>>>>> + read_unlock(&tasklist_lock); >>>>>> + if (ret) >>>>>> + return ret; >>>>> >>>>> why do you need to check ->child_reaper under tasklist_lock? this looks pointless. >>>>> >>>>> In fact I do not understand how it is possible to hit pid_ns->child_reaper == NULL, >>>>> there must be at least one task in this namespace, otherwise you can't open a file >>>>> which has f_op == ns_file_operations, no? >>>> >>>> Sure, it's impossible to pick a pid_ns, if there is no the pid_ns's tasks. I added >>>> it under impression of >>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=dfda351c729733a401981e8738ce497eaffcaa00 >>>> but here it's completely wrong. It will be removed in v2. >>> >>> Hmm. But if I read this commit correctly then we really need to check >>> pid_ns->child_reaper != NULL ? >>> >>> Currently we can't pick an "empty" pid_ns. But after the commit above a task >>> can do sys_unshare(CLONE_NEWPID), another (or the same) task can open its >>> /proc/$pid/ns/pid_for_children and call ns_ioctl() before the 1st alloc_pid() ? >> >> Another task can't open /proc/$pid/ns/pid_for_children before the 1st alloc_pid(), >> because pid_for_children is available to open only after the 1st alloc_pid(). >> So, it's impossible to call ioctl() on it. > > That sounds reasonable. > > There is definitely the chance of the child_reaper dying after we have > joined a pid namespace. So child_reaper can be stale if not NULL. > > As long as we don't mess up the first pid allocation I don't > see any reason why we should care about last_pid in a pid_namespace. > And this ioctl can be used to set all of the other pids on the first > pid allocation by calling it in the parent pid namespace. > > There is still the chance of racing with a pid reaper dying. Why do we > care about child_reaper in this case? > > Changing last_pid is completely pointless if child_reaper is dead or > missing but why would we care? I'm agree with you, there is no a reason we should care about died child_reaper. The protection is already made in pidns_for_children_get(). It's only need to prohibit creation of the first task with pid != 1, which leads to child_reaper-less pid namespace.