On Mon, Apr 01, 2019 at 09:01:29AM -0700, Linus Torvalds wrote: > On Mon, Apr 1, 2019 at 8:55 AM Daniel Colascione <dancol@xxxxxxxxxx> wrote: > > > > > > > I wonder if we really want a fill procfs2, or maybe we could just make > > > the pidfd readable (yes, it's a directory file descriptor, but we > > > could allow reading). > > > > What would read(2) read? > > We could make it read anything, but it would have to be something > people agree is sufficient (and not so expensive to create that rare > users of that data would find the overhead excessive). > > Eg we could make it return the same thing that /proc/<pid>/status > reads right now. > > But it sounds like you need pretty much all of /proc/<pid>/xyz: >From what I gather from this thread we are still best of with using fds to /proc/<pid> as pidfds. Linus, do you agree or have I misunderstood? Yes, we can have an internal mount option to restrict access to various parts of procfs from such pidfds or do the parent-less bind-mount trick but I think this beats having a stunted dummy dirfd that we implement a read method on. One thing is that we also need something to disable access to the "/proc/<pid>/net". One option could be to give the files in "net/" an ->open-handler which checks that our file->f_path.mnt is not one of our special clone() mounts and if it is refuse the open. To clarify the way forward: Jann and I were discussing whether pidfd_open() still makes sense and whether I shouldn't just jump straight to a first version of CLONE_PIDFD. Basically, if you have a system without CONFIG_PROC_FS it makes sense that clone gives back an anon inode file descriptor as pidfds because you can still signal threads in a race-free way. But it doesn't make a lot of sense to have pidfd_open() in this scenario because you can't really do anything with that pidfd apart from sending signals. And on a system like that sending a signal is still racy. Since the process can be recycled between learning the pid number and calling pidfd_open() [1]. So it only makes sense to have _clone()_ give back anon_inode() fds on a system without CONFIG_PROC_FS but it doesn't make sense for pidfd_open() In other news, I think it makes more sense if I jump to the implementation of CLONE_PIDFD instead of working on pidfd_open(). [1]: The only case - that seems rather far-fetched - where it makes sense is when the parent wants to create that pidfd and hand it to someone else. Christian