On Wed, Mar 20, 2019 at 11:52 AM Christian Brauner <christian@xxxxxxxxxx> wrote: > > You're misunderstanding. Again, I said in my previous mails it should > accept pidfds optionally as arguments, yes. But I don't want it to > return the status fds that you previously wanted pidfd_wait() to return. > I really want to see Joel's pidfd_wait() patchset and have more people > review the actual code. Just to make sure that no one is forgetting a material security consideration: $ ls /proc/self attr exe mountinfo projid_map status autogroup fd mounts root syscall auxv fdinfo mountstats sched task cgroup gid_map net schedstat timers clear_refs io ns sessionid timerslack_ns cmdline latency numa_maps setgroups uid_map comm limits oom_adj smaps wchan coredump_filter loginuid oom_score smaps_rollup cpuset map_files oom_score_adj stack cwd maps pagemap stat environ mem personality statm A bunch of this stuff makes sense to make accessible through a syscall interface that we expect to be used even in sandboxes. But a bunch of it does not. For example, *_map, mounts, mountstats, and net are all namespace-wide things that certain policies expect to be unavailable. stack, for example, is a potential attack surface. Etc. As it stands, if you create a fresh userns and mountns and try to mount /proc, there are some really awful and hideous rules that are checked for security reasons. All these new APIs either need to return something more restrictive than a proc dirfd or they need to follow the same rules. And I'm afraid that the latter may be a nonstarter if you expect these APIs to be used in libraries. Yes, this is unfortunate, but it is indeed the current situation. I suppose that we could return magic restricted dirfds, or we could return things that aren't dirfds and all and have some API that gives you the dirfd associated with a procfd but only if you can see /proc/PID. --Andy