On Wed, Mar 20, 2019 at 12:14 PM Christian Brauner <christian@xxxxxxxxxx> wrote: > > On Wed, Mar 20, 2019 at 11:58:57AM -0700, Andy Lutomirski wrote: > > On Wed, Mar 20, 2019 at 11:52 AM Christian Brauner <christian@xxxxxxxxxx> wrote: > > > > > > You're misunderstanding. Again, I said in my previous mails it should > > > accept pidfds optionally as arguments, yes. But I don't want it to > > > return the status fds that you previously wanted pidfd_wait() to return. > > > I really want to see Joel's pidfd_wait() patchset and have more people > > > review the actual code. > > > > Just to make sure that no one is forgetting a material security consideration: > > Andy, thanks for commenting! > > > > > $ ls /proc/self > > attr exe mountinfo projid_map status > > autogroup fd mounts root syscall > > auxv fdinfo mountstats sched task > > cgroup gid_map net schedstat timers > > clear_refs io ns sessionid timerslack_ns > > cmdline latency numa_maps setgroups uid_map > > comm limits oom_adj smaps wchan > > coredump_filter loginuid oom_score smaps_rollup > > cpuset map_files oom_score_adj stack > > cwd maps pagemap stat > > environ mem personality statm > > > > A bunch of this stuff makes sense to make accessible through a syscall > > interface that we expect to be used even in sandboxes. But a bunch of > > it does not. For example, *_map, mounts, mountstats, and net are all > > namespace-wide things that certain policies expect to be unavailable. > > stack, for example, is a potential attack surface. Etc. If you can access these files sources via open(2) on /proc/<pid>, you should be able to access them via a pidfd. If you can't, you shouldn't. Which /proc? The one you'd get by mounting procfs. I don't see how pidfd makes any material changes to anyone's security. As far as I'm concerned, if a sandbox can't mount /proc at all, it's just a broken and unsupported configuration. An actual threat model and real thought paid to access capabilities would help. Almost everything around the interaction of Linux kernel namespaces and security feels like a jumble of ad-hoc patches added as afterthoughts in response to random objections. >> All these new APIs either need to > > return something more restrictive than a proc dirfd or they need to > > follow the same rules. What's wrong with the latter? > > And I'm afraid that the latter may be a > > nonstarter if you expect these APIs to be used in libraries. What's special about libraries? How is a library any worse-off using openat(2) on a pidfd than it would be just opening the file called "/proc/$apid"? > > Yes, this is unfortunate, but it is indeed the current situation. I > > suppose that we could return magic restricted dirfds, or we could > > return things that aren't dirfds and all and have some API that gives > > you the dirfd associated with a procfd but only if you can see > > /proc/PID. > > What would be your opinion to having a > /proc/<pid>/handle > file instead of having a dirfd. Essentially, what I initially proposed > at LPC. The change on what we currently have in master would be: > https://gist.github.com/brauner/59eec91550c5624c9999eaebd95a70df And how do you propose, given one of these handle objects, getting a process's current priority, or its current oom score, or its list of memory maps? As I mentioned in my original email, and which nobody has addressed, if you don't use a dirfd as your process handle or you don't provide an easy way to get one of these proc directory FDs, you need to duplicate a lot of metadata access interfaces.