On Mon, Mar 25, 2019 at 9:15 PM Daniel Colascione <dancol@xxxxxxxxxx> wrote: > On Mon, Mar 25, 2019 at 12:42 PM Jonathan Kowalski <bl0pbl33p@xxxxxxxxx> wrote: > > On Mon, Mar 25, 2019 at 6:57 PM Daniel Colascione <dancol@xxxxxxxxxx> wrote: [...] > > Yes, but everything in /proc is not equivalent to an attribute, or an > > option, and depending on its configuration, you may not want to allow > > processes to even be able to see /proc for any PIDs other than those > > running as their own user (hidepid). This means, even if this new > > system call is added, to respect hidepid, it must, depending on if > > /proc is mounted (and what hidepid is set to, and what gid= is set > > to), return EPERM, because then there is a discrepancy between how the > > two entrypoints to acquire a process handle do access control. > > That's why I proposed that this translation mechanism accept a procfs > root directory --- so you'd specify *which* procfs you want and let > the kernel apply whatever hidepid access restrictions it wants. [...] > > > and 2) it's > > > "fail unsafe": IMHO, most users in practice will skip the line marked > > > "LIVENESS CHECK", and as a result, their code will appear to work but > > > contain subtle race conditions. An explicit interface to translate > > > from a (PIDFD, PROCFS_ROOT) tuple to a /proc/pid directory file > > > descriptor would be both more efficient and fail-safe. > > > > > > [1] as a separate matter, it'd be nice to have a batch version of close(2). > > > > Since /proc is full of gunk, > > People keep saying /proc is bad, but I haven't seen any serious > proposals for a clean replacement. :-) > > > how about adding more to it and making > > the magic symlink of /proc/self/fd for the pidfd to lead to the dirfd > > of the /proc entry of the process it maps to, when one uses > > O_DIRECTORY while opening it? Otherwise, it behaves as it does today. > > It would be equivalent to opening the proc entry with usual access > > restrictions (and hidepid made to work) but without the races, and > > because for processes outside your and children pid ns, it shouldn't > > work anyway, and since they wouldn't have their entry on this procfs > > instance, it would all just fit in nicely? > > Thanks. That'll work. It's a bit magical, but /proc/self/fd is magical > anyway, so that's okay. Please don't do that. /proc/$pid/fd refers to the set of file descriptors the process has open, and semantically doesn't have much to do with the identity of the process. If you want to have a procfs directory entry for getting a pidfd, please add a new entry. (Although I don't see the point in adding a new procfs entry for this when you could instead have an ioctl or syscall operating on the procfs directory fd.)