On Mon, Apr 15, 2019 at 03:52:48PM +0200, Christian Brauner wrote: > On Mon, Apr 15, 2019 at 03:24:16PM +0200, Oleg Nesterov wrote: > > On 04/15, Christian Brauner wrote: > > > > > > > CLONE_PARENT_SETTID doesn't look very usefule, so what if we add > > > > > > > > if ((clone_flags & (CLONE_PIDFD|CLONE_PARENT_SETTID)) == > > > > (CLONE_PIDFD|CLONE_PARENT_SETTID)) > > > > return ERR_PTR(-EINVAL); > > > > > > > > at the start of copy_process() ? > > > > > > > > Then it can do > > > > > > > > if (clone_flags & CLONE_PIDFD) { > > > > retval = pidfd_create(pid, &pidfdf); > > > > if (retval < 0) > > > > goto bad_fork_free_pid; > > > > retval = put_user(retval, parent_tidptr) > > > > if (retval < 0) > > > > goto bad_fork_free_pid; > > > > } > > > > > > Uhhh Oleg, that is nifty. I have to say I like that a lot. This would > > > let us return the pid and the pidfd in one go and we can also start > > > pidfd numbering at 0. > > > > Christian, sorry if it was already discussed, but I can't force myself to > > read all the previous discussions ;) > > > > If we forget about CONFIG_PROC_FS, why do we really want to create a file? > > > > > > Suppose we add a global u64 counter incremented by copy_process and reported > > in /proc/$pid/status. Suppose that clone(CLONE_PIDFD) writes this counter to > > *parent_tidptr. Let's denote this counter as UNIQ_PID. > > > > Now, if you want to (say) safely kill a task and you have its UNIQ_PID, you > > can do > > > > kill_by_pid_uniq(int pid, u64 uniq_pid) > > { > > pidfd = open("/proc/$pid", O_DIRECTORY); > > > > status = openat(pidfd, "status"); > > u64 this_uniq_pid = ... read UNIQ_PID from status ...; > > > > if (uniq_pid != this_uniq_pid) > > return; > > > > pidfd_send_signal(pidfd); > > } > > > > Why else do we want pidfd? > > I think this was thrown around at one point but this is rather > inelegant imho. It basically makes a process unique by using a > combination of two identifiers. You end up with a similar concept but > you make it way less flexible and extensible imho. With pidfds you can > not care about pids at all if you don't want to. The UNIQ_PID thing > would require you to always juggle two identifiers. > > Your example would also only work if CONFIG_PROC_FS is set (Not sure if > that's what you meant by "forget about CONFIG_PROC_FS")? Say, you get > a pid from clone() and your UNIQ_PID thing. Then you still can't > reliably kill a process because pidfd_send_signal() is not useable since > you can't get pidfds. And if you go the kill way you end up with the same > problem. Yes, you could solve this by probably extending syscalls to > take a UNIQ_PID argument but that seems very inelegant. > > The UNIQ_PID implementation would also require being tracked in the > kernel either in task_struct or struct pid potentially and thus would > probably add more infrastructure in the kernel. We don't need any of > that if we simply rely on pidfds. > > Most of all, the pidfd concept allows one way more flexibility in > extending it. For example, Joel is working on a patchset to make pidfds > pollable so you can get information about a process death by polling > them. We also want to be able to potentially wait on a process with > waitid(W_PIDFD) or similar as suggested by Linus in earlier threads. At > that point you end up in a similar situation as tgkill() where you pass > a tgid and a pid already to make sure that the pid you pass has the tgid > as thread-group leader. That is all way simpler with pidfds. I agree the pidfd file descriptor approach is simpler than dealing with 2 pids and is needed for the poll notification support I posted. Also in the future it allows for a pidfd to sent over IPC to another process using binder or unix domain sockets. thanks, - Joel