On Di, 09.02.21 15:57, Antonius Frie (antonius.frie@xxxxxxxxxxxxxxxxxx) wrote: > Hi! > > So this is kind of a follow-up to the thread in [1], and the corresponding > PR in [2]. > > In short, the PR made some changes to allow for cases where /proc was not > available in the mount namespace of the service, and added a test [3] to > make sure that this would work. This test was later removed and rewritten to > block /sys instead [4], because it turned out that having /proc unavailable > sometimes caused problems with close_all_fds(), which is called in > exec_child() after namespaces have been set up. > > On current master, services that don't have /proc mounted don't work at all > anymore, since find_executable_full() ends up opening the given path and > calling access_fd() on the resulting fd, and access_fd uses /proc/self/fd/* > to turn the fd back into a path it can call access() on. As far as I can > tell, the reason for not using access on the path directly is that access_fd > is more elegant since it avoids a potential race condition. Yes, we try to move to a mode where for most such things that involve context switches/credential switches/domain transitions we operate via O_PATH file handles: i.e. resolve in our original context, until we only have fds pointing to the final thing, and then do the final operation only on those fds. This should fix a bunch of races and potential races for us. > In addition to this, setup_private_users() also needs access to > /proc/$pid/{uid_map, gid_map, setgroups} to do its job. Yes, a multitude of Linux APIs are exposed via /proc/. I think outside of trivial programs it's very hard to avoid having /proc/. glibc internally encodes access to it all over the place too. > Given all this, I guess my question is whether it is still desirable to > allow units to run without /proc, especially given that ProtectProc and > ProcSubset exist now.* If not, it might be nice to just always mount /proc > if it wouldn't otherwise be there (i.e. if RootImage/RootDirectory is used); > currently, MountAPIVFS=yes is basically a required option because of this. > (I guess you could mount proc manually, but then you can't use > ProtectProc/ProcSubset.) I'm a bit unhappy about this, because MountAPIVFS > also mounts /sys and /dev, and then you need separate options just to > protect those again. Either way, maybe it would be good to explicitly state > this requirement in the documentation? We could add MountAPIVFS=proc or so as alternative to yes/no, which would only mount /proc. Note that on current git it actually also mounts /run/ and that on current git it also defaults to true if RootImage=/RootDirectory= are used, see 6119878480aab4c10ad6af33deab221778683807. You can get force MountAPIVFS=no still btw, to get back the status quo ante: i.e. a RootImage=/RootDirectory= env without /proc. > Anyway, I hope that this was okay to post here, I don't really know a lot > about this and maybe there are good reasons for why things are the way they > are. I'd be happy about feedback though. Yes, this is the right place. If you think the MountAPIVFS=proc thing would be desirable to you, consider posting an RFE issue asking for it on github. Or even better, submit a PR. > * Using both ProtectProc=ptraceable and ProcSubset=pid really doesn't > let a lot of things through, and I don't think those interfere with any of > the functions described above. The only thing I'm unsure about is > setup_private_users(), since that spawns off a child process which then > goes and writes to /proc/$parent_pid/, but I guess children can ptrace > their parents? At least it seemed to work when I just tested it. On traditional Linux any ptracable means "uid matches". With yama lsm parents can ptrce the children but not vice versa. Lennart -- Lennart Poettering, Berlin _______________________________________________ systemd-devel mailing list systemd-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/systemd-devel