On Mon, Apr 15, 2019 at 2:26 PM Jonathan Kowalski <bl0pbl33p@xxxxxxxxx> wrote: > > On Mon, Apr 15, 2019 at 9:34 PM Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > I would personally *love* it if distros started setting no_new_privs > > for basically all processes. And pidfd actually gets us part of the > > way toward a straightforward way to make sudo and su still work in a > > no_new_privs world: su could call into a daemon that would spawn the > > privileged task, and su would get a (read-only!) pidfd back and then > > wait for the fd and exit. I suppose that, done naively, this might > > cause some odd effects with respect to tty handling, but I bet it's > > solveable. I suppose it would be nifty if there were a way for a > > Hmm, isn't what you're describing roughly what systemd-run -t does? It > will serialize the argument list, ask PID 1 to create a transient unit > (go through the polkit stuff), and then set the stdout/stderr and > stdin of the service to your tty, make it the controlling terminal of > the process and > reset it. So I guess it should work with sudo/su just fine too. > > There is also s6-sudod (and a s6-sudoc client to it) that works in a > similar fashion, though it's a lot less fancy. Cute. Now we just distros to work out the kinks and to ship these as sudo and su :) > > > process, by mutual agreement, to reparent itself to an unrelated > > process. > > > > Anyway, clone(2) is an enormous mess. Surely the right solution here > > is to have a whole new process creation API that takes a big, > > extensible struct as an argument, and supports *at least* the full > > abilities of posix_spawn() and ideally covers all the use cases for > > fork() + do stuff + exec(). It would be nifty if this API also had a > > way to say "add no_new_privs and therefore enable extra functionality > > that doesn't work without no_new_privs". This functionality would > > include things like returning a future extra-privileged pidfd that > > gives ptrace-like access. > > My idea was that this intent could be supplied at clone time, you > could attach ptrace access modes to a pidfd (we could make those a bit > granular, perhaps) and any API that takes PIDs and checks against the > caller's ptrace access mode could instead derive so from the pidfd. > Since killing is a bit convoluted due to setuid binaries, that should > work if one is CAP_KILL capable in the owning userns of the task, and > if not that, has permissions to kill and the target has NNP set. This CAP_KILL trick makes me nervous. This particular permission is really quite powerful, and it would need some analysis to conclude that it's not *more* powerful than CAP_KILL. > This > would allow you to bind kill privileges in a way that is compatible > with both worlds, the upshot being NNP allows for the functionality to > be available to a lot more of userspace. Ofcourse, this would require > a new clone version, possibly with taking a clone2 struct which sets a > few parameters for the process and the flags for the pidfd. > > Another point is that you have a pidfd_open (or something else) that > can create multiple pidfds from a pidfd obtained at clone time and > create pidfds with varying level of rights. It can also work by taking > a TID to open a pidfd for an external task (and then for all the > rights you wish to acquire on it, check against your ambient > authority). Indeed. > > (Actually, in general, having FMODE_* style bits spanning all methods > a file descriptor can take (through system calls), with the type of > object as key (class containing a set), and be able to enable/disable > them and seal them would be a useful addition, this all happening at > the struct file level instead of inode level sealing in memfds). At the risk of saying a dirty word, the Windows API works quite a bit like this :)