Hi Christian, For next revision, could you also CC surenb@xxxxxxxxxx as well? He is also working on the low memory killer. And also suggest CC to kernel-team@xxxxxxxxxxx. And mentioned some comments below, thanks. On Thu, May 16, 2019 at 03:59:42PM +0200, Christian Brauner wrote: [snip]
diff --git a/kernel/pid.c b/kernel/pid.c index 20881598bdfa..4afca3d6dcb8 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -38,6 +38,7 @@ #include <linux/syscalls.h> #include <linux/proc_ns.h> #include <linux/proc_fs.h> +#include <linux/sched/signal.h> #include <linux/sched/task.h> #include <linux/idr.h> @@ -451,6 +452,55 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) return idr_get_next(&ns->idr, &nr); } +/** + * pidfd_open() - Open new pid file descriptor. + * + * @pid: pid for which to retrieve a pidfd + * @flags: flags to pass + * + * This creates a new pid file descriptor with the O_CLOEXEC flag set for + * the process identified by @pid. Currently, the process identified by + * @pid must be a thread-group leader. This restriction currently exists + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot + * be used with CLONE_THREAD) and pidfd polling (only supports thread group + * leaders). + * + * Return: On success, a cloexec pidfd is returned. + * On error, a negative errno number will be returned. + */ +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) +{ + int fd, ret; + struct pid *p; + struct task_struct *tsk; + + if (flags) + return -EINVAL; + + if (pid <= 0) + return -EINVAL; + + p = find_get_pid(pid); + if (!p) + return -ESRCH; + + ret = 0; + rcu_read_lock(); + /* + * If this returns non-NULL the pid was used as a thread-group + * leader. Note, we race with exec here: If it changes the + * thread-group leader we might return the old leader. + */ + tsk = pid_task(p, PIDTYPE_TGID);
Just trying to understand the comment here. The issue is that we might either return the new leader, or the old leader depending on the overlap with concurrent de_thread right? In either case, we don't care though. I suggest to remove the "Note..." part of the comment since it doesn't seem the race is relevant here unless we are doing something else with tsk in the function, but if you want to keep it that's also fine. Comment text should probably should be 'return the new leader' though.
+ if (!tsk) + ret = -ESRCH;
Perhaps -EINVAL? AFAICS, this can only happen if a CLONE_THREAD pid was passed as argument to pidfd_open which is invalid. But let me know what you had in mind.. thanks, - Joel
+ rcu_read_unlock(); + + fd = ret ?: pidfd_create(p); + put_pid(p); + return fd; +} + void __init pid_idr_init(void) { /* Verify no one has done anything silly: */ -- 2.21.0