On 26.04.2017 19:32, Eric W. Biederman wrote: > Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes: > >> On 26.04.2017 19:11, Kirill Tkhai wrote: >>> On 26.04.2017 18:53, Oleg Nesterov wrote: >>>> On 04/17, Kirill Tkhai wrote: >>>>> >>>>> +struct pidns_ioc_req { >>>>> +/* Set vector of last pids in namespace hierarchy */ >>>>> +#define PIDNS_REQ_SET_LAST_PID_VEC 0x1 >>>>> + unsigned int req; >>>>> + void __user *data; >>>>> + unsigned int data_size; >>>>> + char std_fields[0]; >>>>> +}; >>>> >>>> see below, >>>> >>>>> +static long set_last_pid_vec(struct pid_namespace *pid_ns, >>>>> + struct pidns_ioc_req *req) >>>>> +{ >>>>> + char *str, *p; >>>>> + int ret = 0; >>>>> + pid_t pid; >>>>> + >>>>> + read_lock(&tasklist_lock); >>>>> + if (!pid_ns->child_reaper) >>>>> + ret = -EINVAL; >>>>> + read_unlock(&tasklist_lock); >>>>> + if (ret) >>>>> + return ret; >>>> >>>> why do you need to check ->child_reaper under tasklist_lock? this looks pointless. >>>> >>>> In fact I do not understand how it is possible to hit pid_ns->child_reaper == NULL, >>>> there must be at least one task in this namespace, otherwise you can't open a file >>>> which has f_op == ns_file_operations, no? >>> >>> Sure, it's impossible to pick a pid_ns, if there is no the pid_ns's tasks. I added >>> it under impression of >>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=dfda351c729733a401981e8738ce497eaffcaa00 >>> but here it's completely wrong. It will be removed in v2. >>> >>>>> + if (req->data_size >= PAGE_SIZE) >>>>> + return -EINVAL; >>>>> + str = vmalloc(req->data_size + 1); >>>> >>>> then I don't understand why it makes sense to use vmalloc() >>>> >>>>> + if (!str) >>>>> + return -ENOMEM; >>>>> + if (copy_from_user(str, req->data, req->data_size)) { >>>>> + ret = -EFAULT; >>>>> + goto out_vfree; >>>>> + } >>>>> + str[req->data_size] = '\0'; >>>>> + >>>>> + p = str; >>>>> + while (p && *p != '\0') { >>>>> + if (!ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN)) { >>>>> + ret = -EPERM; >>>>> + goto out_vfree; >>>>> + } >>>>> + >>>>> + if (sscanf(p, "%d", &pid) != 1 || pid < 0 || pid > pid_max) { >>>>> + ret = -EINVAL; >>>>> + goto out_vfree; >>>>> + } >>>> >>>> Well, this is ioctl(), do we really want to parse the strings? >>>> >>>> Can't we make >>>> >>>> struct pidns_ioc_req { >>>> ... >>>> int nr_pids; >>>> pid_t pids[0]; >>>> } >>>> >>>> and just use get_user() in a loop? This way we can avoid vmalloc() or anything >>>> else altogether. >>> >>> Since it's a generic structure for different types of the requests, it may be extended >>> in the future. We won't be able to add new fields, if we compose the structure the way >>> you suggested, will we? >> >> Though, we may go this way if just do the fields generic: >> >> struct pidns_ioc_req { >> unsigned int req; >> unsigned int data_size; >> union { >> pid_t pid[0]; >> }; >> }; >> >> Ok, I'll rework the patchset in this way. > > You don't need that. That is what new ioctl numbers are for. > > Interfaces to the kernel don't need to become multiplexors to prepare > for the future when there is already an appropriate multiplexing > interface in place. That is, do you suggest to not introduce NS_SPECIFIC_IO from the first patch, and add PIDNS_REQ_SET_LAST_PID_VEC to the list of generic ns ioctls? ... #define NS_GET_OWNER_UID _IO(NSIO, 0x4) #define PIDNS_REQ_SET_LAST_PID_VEC _IO(NSIO, 0x5)