Hi!
So this is kind of a follow-up to the thread in [1], and the
corresponding PR in [2].
In short, the PR made some changes to allow for cases where /proc was
not available in the mount namespace of the service, and added a test
[3] to make sure that this would work. This test was later removed and
rewritten to block /sys instead [4], because it turned out that having
/proc unavailable sometimes caused problems with close_all_fds(), which
is called in exec_child() after namespaces have been set up.
On current master, services that don't have /proc mounted don't work at
all anymore, since find_executable_full() ends up opening the given path
and calling access_fd() on the resulting fd, and access_fd uses
/proc/self/fd/* to turn the fd back into a path it can call access() on.
As far as I can tell, the reason for not using access on the path
directly is that access_fd is more elegant since it avoids a potential
race condition.
In addition to this, setup_private_users() also needs access to
/proc/$pid/{uid_map, gid_map, setgroups} to do its job.
Given all this, I guess my question is whether it is still desirable to
allow units to run without /proc, especially given that ProtectProc and
ProcSubset exist now.* If not, it might be nice to just always mount
/proc if it wouldn't otherwise be there (i.e. if RootImage/RootDirectory
is used); currently, MountAPIVFS=yes is basically a required option
because of this. (I guess you could mount proc manually, but then you
can't use ProtectProc/ProcSubset.) I'm a bit unhappy about this, because
MountAPIVFS also mounts /sys and /dev, and then you need separate
options just to protect those again. Either way, maybe it would be good
to explicitly state this requirement in the documentation?
Anyway, I hope that this was okay to post here, I don't really know a
lot about this and maybe there are good reasons for why things are the
way they are. I'd be happy about feedback though.
Cheers,
Antonius
* Using both ProtectProc=ptraceable and ProcSubset=pid really doesn't
let a lot of things through, and I don't think those interfere with any
of the functions described above. The only thing I'm unsure about is
setup_private_users(), since that spawns off a child process which then
goes and writes to /proc/$parent_pid/, but I guess children can ptrace
their parents? At least it seemed to work when I just tested it.
[1]:
https://lists.freedesktop.org/archives/systemd-devel/2017-April/038634.html
[2]: https://github.com/systemd/systemd/pull/5985
[3]: https://github.com/systemd/systemd/pull/6017
[4]:
https://github.com/systemd/systemd/commit/054d871d41039fcfc1a4a661c979941b9660c9e6
_______________________________________________
systemd-devel mailing list
systemd-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/systemd-devel