Hello Christian, On 9/19/19 8:47 AM, Christian Brauner wrote: > On Thu, Sep 19, 2019 at 06:04:55AM +0200, Michael Kerrisk (man-pages) wrote: [...] >>>> Thanks for that info. One other questions springs to mind. >>>> I haven't looked at the source or tried testing this, >>>> but can anything actually be read() from a PIDFD? Presumably, >>> >>> We had discussed this but decided to not implement this right away. >>> Mainly, because we did not have a clear picture what the semantics >>> should be. But it is something that we will probably want in the >>> future... >> >> That makes sense. >> >> A further question... We now have three ways of getting a >> process file descriptor [*]: >> >> open() of /proc/PID >> pidfd_open() >> clone()/clone3() with CLONE_PIDFD >> >> I thought the FD was supposed to be equivalent in all three cases. >> However, if I try (on kernel 5.3) poll() an FD returned by opening >> /proc/PID, poll() tells me POLLNVAL for the FD. Is that difference >> intentional? (I am guessing it is not.) > > It's intentional. > The short answer is that /proc/<pid> is a convenience for sending > signals. > The longer answer is that this stems from a heavy debate about what a > process file descriptor was supposed to be and some people pushing for > at least being able to use /proc/<pid> dirfds while ignoring security > problems as soon as you're talking about returning those fds from > clone(); not to mention the additional problems discovered when trying > to implementing this. > A "real" pidfd is one from CLONE_PIDFD or pidfd_open() and all features > such as exit notification, read, and other future extensions will only > be implemented on top of them. > As much as we'd have liked to get rid of two different file descriptor > types it doesn't hurt us much and is not that much different from what > we will e.g. see with fsinfo() in the new mount api which needs to work > on regular fds gotten via open()/openat() and mountfds gotten from > fsopen() and fspick(). The mountfds will also allow for advanced > operations that the other ones will not. There's even an argument to be > made that fds you will get from open()/openat() and openat2() are > different types since they have very different behavior; openat2() > returning fds that are non arbitrarily upgradable etc. Okay. So, it would be fair to say (in the man pages) that pidfd_open() is the preferred way of obtaining a PID file descriptor for an already existing process? >> [*} By the way, going forward, can we call these things >> "process FDs", rather than "PID FDs"? The API names are what >> they are, an that's okay, but these just as we have socket >> FDs that refer to sockets, directory FDs that refer to >> directories, and timer FDs that refer to timers, and so on, >> these are FDs that refer to *processes*, not "process IDs". >> It's a little thing, but I think the naming better, and >> it's what I propose to use in the manual pages. > > The naming was another debate and we ended with this compromise. > I would just clarify that a pidfd is a process file descriptor. I > wouldn't make too much of a deal of hiding the shortcut "pidfd". People > are already using it out there in the wild and it's never proven a good > idea to go against accepted practice. Okay. I have a draft pidfd_open(2) page that I will send out soon. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/