On Sun, Mar 31, 2019 at 02:09:03PM -0600, Andy Lutomirski wrote: > > > > On Mar 30, 2019, at 11:24 AM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > >> On Sat, Mar 30, 2019 at 10:12 AM Christian Brauner <christian@xxxxxxxxxx> wrote: > >> > >> > >> To clarify, what the Android guys really wanted to be part of the api is > >> a way to get race-free access to metadata associated with a given pidfd. > >> And the idea was that *if and only if procfs is mounted* you could do: > >> > >> int pidfd = pidfd_open(1234, 0); > >> > >> int procfd = open("/proc", O_RDONLY | O_CLOEXEC); > >> int procpidfd = ioctl(pidfd, PIDFD_TO_PROCFD, procfd); > > > > And my claim is that this is three system calls - one of them very > > hacky - to just do > > > > int pidfd = open("/proc/%d", O_PATH); > > Hi Linus- > > I want to re-check this because I think Christian’s example was bad. I proposed these ioctls, but that wasn’t the intended use. The real point is: Getting metadata access was pushed as essential originally which is why this ioctl() came up in the first place. The concerns about CLONE_PIDFD were not relevant when this came up [1]: <quote> > And how do you propose, given one of these handle objects, getting a > process's current priority, or its current oom score, or its list of > memory maps? As I mentioned in my original email, and which nobody has > addressed, if you don't use a dirfd as your process handle or you > don't provide an easy way to get one of these proc directory FDs, you > need to duplicate a lot of metadata access interfaces. An API that takes a process handle object and an fd pointing at /proc (the root of the proc fs) and gives you back a proc dirfd would do the trick. You could do this with no new kernel features at all if you're willing to read the pid, call openat(2), and handle the races in user code. <quote> [1]: https://lore.kernel.org/lkml/CALCETrUFrFKC2YTLH7ViM_7XPYk3LNmNiaz6s8wtWo1pmJQXzg@xxxxxxxxxxxxxx/ > > int pidfd = new_improved_clone(...); > > To be useful, this type of API *must* work without proc mounted. > > And, later: > > openat(fd to pidfd’s proc directory, “status”, ...); > > And we want a non-utterly-crappy way to do this. The ioctl is certainly ugly, but it *works*. > > Another approach is: > > pid_t pid = pidfd_get_pid(pidfd); > sprintf(buf, “/proc/%d”, pid); > int procfd = open(buf, O_PATH); > if (pidfd_get_pid(pidfd) != pid) { > we lose; > } > > But this is clunky. > > Do you think the clunky version is okay, or do you have a suggestion for making it better? > > —Andy