Dear list, I have a question about the intended use of the RuntimeDirectory directive regarding the use for, what I'd call, ephemeral chroot environments. Also I would like some clarification about RestrictAddressFamilies, but that is only related in that it happened to come up when hardening a socket-activated service unit, so if it is better to handle that in a separate thread, just tell me and I will create one. A bit of context: I only recently found out about `systemd-analyze security` and used it to harden `unbound.service` which comes with Ubuntu 24.04 LTS. And while I was at it I came up with a way to get a chroot environment whose lifetime is limited to the runtime of the service itself, like so: [Service] DynamicUser=yes # Only confine ExecStart=, so we need neither Bind* unbound-control nor the unix socket RootDirectoryStartOnly=yes # Use systemd's chroot capabilities RootDirectory=%t/%N # put it in a runtime directory so it leaves no trace after exit and need not exist # this should be the same path as above RuntimeDirectory=%N # But it is not an actual runtime directory meant to be writeable UnsetEnvironment=RUNTIME_DIRECTORY ReadOnlyPaths=+%t/%N For reference see the attached full output of `systemctl cat unbound.service` in the attachments. I hope the comments carry the intent. The idea is to have the chroot created when the service starts and destroyed when it exits. There seems to be no other way to do this than to (ab)use `RuntimeDirectory=` as shown above. I've tried this with `TeporaryFileSystem` but that requires that the directory already exists. So my main concern is if there are any side effects I may not be aware of that might come back to bite me. So far this unit runs like a charm with the least privileges I could manage to get working. `systemd-analyze security` shows an exposure level of "1.1 OK", so that's pretty good. And thanks by the way for this tool, it is great for finding things I, as well as upstream and/or Canonical, was not even aware of. I am even considering sending these as enhancements to one or both of the latter. But that depends on what you, the experts, tell me. My rationale for doing this is basically that this way the unit is basically self-contained in a way, so there is nothing that needs to happen outside to set up it's runtime environment. The `UnsetEnvironment=RUNTIME_DIRECTORY` and `ReadOnlyPaths=+%t/%N` may be unnecessary since I don't expect unbound to be using those anyway, but then again this is about hardening and a compromised service may be able to make use of this, was my thinking there. I was unable, however, to actually see the contents of `+%t/%N`, if any, since I wanted to know if there is a loop situation, since that would point to the RootDirectory. I am not quite sure why I could not see anything in there, though, but am suspecting that this has to do with it being `unshare`d. Additional info on this would be very welcome too, even if it's just an RTFM pointer; the documentation is kind of overwhelming, after all. I hope this covers my main question. As for the `RestrictAdressFamilies` directive, I want to know if it is even possible to restrict AF_{NETLINK,UNIX,INET,INET6} when a service is socket-activated. I somehow got the idea in my head that the service executable should not need to do any binding itself since that should have happened already by starting the corresponding socket unit. But this seems impossible with unbound. Or am I misunderstanding how this works? Is that at least theoretically possible? I am half suspecting that it is but that unbound does not support this since systemd-integration seems to be an afterthought. So if this is only because of some missing integration on unbound's part I would like to get that upstreamed, because then I could check three more boxes in `systemd-analyze security`. For reference I am attaching the output `systemctl cat unbound.{service,socket}` and `systemd-analyze security unbound.service` `systemd --version`: systemd 255 (255.4-1ubuntu8.5) +PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified Thanks for your time and, of course, systemd! Peter
# /etc/systemd/system/unbound.socket # /etc/systemd/system/unbound.service [Unit] Description=Socket(s) (including control) for Unbound DNS server Documentation=man:unbound(8) DefaultDependencies=no After=systemd-sysusers.service Requires=sysinit.target Conflicts=shutdown.target #Before=systemd-resolved.service sysinit.target network.target nss-lookup.target shutdown.target Before=systemd-resolved.service nss-lookup.target shutdown.target [Socket] ListenDatagram=127.0.0.1:53 ListenStream=127.0.0.1:53 ListenStream=%t/unbound-control/%N.ctl SocketGroup=%N SocketMode=0660 [Install] WantedBy=sockets.target
# /etc/systemd/system/unbound.service [Unit] Description=Unbound DNS server Documentation=man:unbound(8) After=network.target Before=nss-lookup.target Wants=nss-lookup.target [Service] Type=notify Restart=on-failure EnvironmentFile=-/etc/default/unbound ExecStartPre=-/usr/libexec/unbound-helper chroot_setup ExecStartPre=-/usr/libexec/unbound-helper root_trust_anchor_update ExecStart=/usr/sbin/unbound -d -p $DAEMON_OPTS ExecStopPost=-/usr/libexec/unbound-helper chroot_teardown ExecReload=+/bin/kill -HUP $MAINPID [Install] WantedBy=multi-user.target # /etc/systemd/system/unbound.service.d/override-hardening.conf # /etc/systemd/system/unbound.service.d/override.conf # This overrides the service unit that comes with Ubuntu in an effort to # maximize security. [Unit] # For verifying keys, time should be synced first After=time-sync.target [Service] # do not run chroot helper ExecStartPre= # This is done by external service unit, see additional *-hardening.conf # ExecStartPre=-/usr/libexec/unbound-helper root_trust_anchor_update ExecStart= # Don't daemonize and send output to stderr for journald ExecStart=unbound -ddp $DAEMON_OPTS # No daemonizing and hence no PID file neccessary, see -d and -p in unbound(8) PIDFile= # No chroot teardown helper necessary ExecStopPost= ExecReload= # This at least waits for a reply in contrast to just sending HUP ExecReload=unbound-control -q reload [Install] # Don't install any dependencies since we rely solely on socket activation WantedBy= Also=%N.socket # /etc/systemd/system/unbound.service.d/zz-override-hardening.conf # This overrides the service unit that comes with Ubuntu in an effort to # maximize security. [Unit] # Use external bootstrapping b/c this one is too restrictive Requires=%N-trust-anchor-update.service After=%N-trust-anchor-update.service # Double check if it worked AssertPathExists=%S/%N/root.key # Hard dependency on socket because we have no privileges BindsTo=%N.socket #After=<unnecessary b/c sockets have implicit Before=> [Service] DynamicUser=yes # Only confine ExecStart=, so we need neither Bind* unbound-control nor the unix socket RootDirectoryStartOnly=yes # Use systemd's chroot capabilities RootDirectory=%t/%N # put it in a runtime directory so it leaves no trace after exit and need not exist # this should be the same path as above RuntimeDirectory=%N # But it is not an actual runtime directory meant to be writeable UnsetEnvironment=RUNTIME_DIRECTORY ReadOnlyPaths=+%t/%N # root.key lives in StateDirectory, i.e. /var/lib/unbound or /var/lib/private/unbound, in # case of running as dynamic user StateDirectory=%N ConfigurationDirectory=%N ## Binaries # need unbound-control for reload action and the control socket #BindReadOnlyPaths=/usr/sbin/unbound /usr/sbin/unbound-control %t/%N-control/%N.ctl BindReadOnlyPaths=/usr/sbin/unbound # For some reason unbound needs access to this too # lest it complain about systemd not running? BindReadOnlyPaths=%t/systemd/system BindReadOnlyPaths=/etc/ssl/certs/ca-certificates.crt ## required shared objects # linker BindReadOnlyPaths=/lib64/ld-linux-x86-64.so.2 # printf 'BindReadOnlyPaths=%s\n' $(ldd $(command which unbound) | # sed -nE 's/.* => (.*) \([^)]*\)$/\1/p' BindReadOnlyPaths=/lib/x86_64-linux-gnu/libc.so.6 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libcap.so.2 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libcrypto.so.3 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libevent-2.1.so.7 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libexpat.so.1 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libgcrypt.so.20 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libgpg-error.so.0 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libhiredis.so.1.1.0 BindReadOnlyPaths=/lib/x86_64-linux-gnu/liblz4.so.1 BindReadOnlyPaths=/lib/x86_64-linux-gnu/liblzma.so.5 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libm.so.6 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libnghttp2.so.14 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libprotobuf-c.so.1 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libpython3.12.so.1.0 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libssl.so.3 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libsystemd.so.0 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libz.so.1 BindReadOnlyPaths=/lib/x86_64-linux-gnu/libzstd.so.1 ProtectClock=yes # Implied by DynamicUser=yes ProtectSystem=strict ProtectHome=yes PrivateDevices=yes ProtectKernelTunables=yes ProtectControlGroups=yes RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 AF_NETLINK CapabilityBoundingSet= SystemCallFilter=~@clock @cpu-emulation @debug @module @mount @obsolete @privileged @raw-io @reboot @resources @swap #SystemCallFilter=~@clock @cpu-emulation @debug @module @mount @obsolete @raw-io @reboot @resources @swap SystemCallArchitectures=native # only necessary without socket activation and unbound doing chroot #CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_SETUID CAP_SETGID CAP_CHOWN LockPersonality=yes NoNewPrivileges=yes PrivateUsers=yes ProtectHostname=yes ProtectKernelLogs=yes ProtectKernelModules=yes ProtectProc=invisible MountAPIVFS=yes ProcSubset=pid MemoryDenyWriteExecute=yes RestrictRealtime=yes RestrictNamespaces=yes UMask=077 [Install] # Don't install any dependencies since we rely solely on socket activation WantedBy=
NAME DESCRIPTION EXPOSURE ✓ SystemCallFilter=~@swap System call deny list defined for service, and @swap is included ✓ SystemCallFilter=~@resources System call deny list defined for service, and @resources is included ✓ SystemCallFilter=~@reboot System call deny list defined for service, and @reboot is included ✓ SystemCallFilter=~@raw-io System call deny list defined for service, and @raw-io is included ✓ SystemCallFilter=~@privileged System call deny list defined for service, and @privileged is included ✓ SystemCallFilter=~@obsolete System call deny list defined for service, and @obsolete is included ✓ SystemCallFilter=~@mount System call deny list defined for service, and @mount is included ✓ SystemCallFilter=~@module System call deny list defined for service, and @module is included ✓ SystemCallFilter=~@debug System call deny list defined for service, and @debug is included ✓ SystemCallFilter=~@cpu-emulation System call deny list defined for service, and @cpu-emulation is included ✓ SystemCallFilter=~@clock System call deny list defined for service, and @clock is included ✓ RemoveIPC= Service user cannot leave SysV IPC objects around ✓ User=/DynamicUser= Service runs under a transient non-root user identity ✓ RestrictRealtime= Service realtime scheduling access is restricted ✓ CapabilityBoundingSet=~CAP_SYS_TIME Service processes cannot change the system clock ✓ NoNewPrivileges= Service processes cannot acquire new privileges ✓ AmbientCapabilities= Service process does not receive ambient capabilities ✓ CapabilityBoundingSet=~CAP_BPF Service may load BPF programs ✓ SystemCallArchitectures= Service may execute system calls only with native ABI ✗ RestrictAddressFamilies=~AF_NETLINK Service may allocate netlink sockets 0.1 ✗ RestrictAddressFamilies=~AF_UNIX Service may allocate local sockets 0.1 ✗ RestrictAddressFamilies=~AF_(INET|INET6) Service may allocate Internet sockets 0.3 ✓ ProtectSystem= Service has strict read-only access to the OS file hierarchy ✓ ProtectProc= Service has restricted access to process tree (/proc hidepid=) ✓ SupplementaryGroups= Service has no supplementary groups ✓ CapabilityBoundingSet=~CAP_SYS_RAWIO Service has no raw I/O access ✓ CapabilityBoundingSet=~CAP_SYS_PTRACE Service has no ptrace() debugging abilities ✓ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE) Service has no privileges to change resource use parameters ✓ CapabilityBoundingSet=~CAP_NET_ADMIN Service has no network configuration privileges ✓ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has no elevated networking privileges ✓ CapabilityBoundingSet=~CAP_AUDIT_* Service has no audit subsystem access ✓ CapabilityBoundingSet=~CAP_SYS_ADMIN Service has no administrator privileges ✓ PrivateTmp= Service has no access to other software's temporary files ✓ ProcSubset= Service has no access to non-process /proc files (/proc subset=) ✓ CapabilityBoundingSet=~CAP_SYSLOG Service has no access to kernel logging ✓ ProtectHome= Service has no access to home directories ✓ PrivateDevices= Service has no access to hardware devices ✓ RootDirectory=/RootImage= Service has its own root directory/image ✗ PrivateNetwork= Service has access to the host's network 0.5 ✗ DeviceAllow= Service has a device ACL with some special devices: char-rtc:r 0.1 ✓ KeyringMode= Service doesn't share key material with other services ✓ Delegate= Service does not maintain its own delegated control group subtree ✓ PrivateUsers= Service does not have access to other users ✗ IPAddressDeny= Service does not define an IP address allow list 0.2 ✓ NotifyAccess= Service child processes cannot alter service state ✓ ProtectClock= Service cannot write to the hardware clock or system clock ✓ CapabilityBoundingSet=~CAP_SYS_PACCT Service cannot use acct() ✓ CapabilityBoundingSet=~CAP_KILL Service cannot send UNIX signals to arbitrary processes ✓ ProtectKernelLogs= Service cannot read from or write to the kernel log ring buffer ✓ CapabilityBoundingSet=~CAP_WAKE_ALARM Service cannot program timers that wake up the system ✓ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER) Service cannot override UNIX file/IPC permission checks ✓ ProtectControlGroups= Service cannot modify the control group file system ✓ CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE Service cannot mark files immutable ✓ CapabilityBoundingSet=~CAP_IPC_LOCK Service cannot lock memory into RAM ✓ ProtectKernelModules= Service cannot load or read kernel modules ✓ CapabilityBoundingSet=~CAP_SYS_MODULE Service cannot load kernel modules ✓ CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG Service cannot issue vhangup() ✓ CapabilityBoundingSet=~CAP_SYS_BOOT Service cannot issue reboot() ✓ CapabilityBoundingSet=~CAP_SYS_CHROOT Service cannot issue chroot() ✓ PrivateMounts= Service cannot install system mounts ✓ CapabilityBoundingSet=~CAP_BLOCK_SUSPEND Service cannot establish wake locks ✓ MemoryDenyWriteExecute= Service cannot create writable executable memory mappings ✓ RestrictNamespaces=~user Service cannot create user namespaces ✓ RestrictNamespaces=~pid Service cannot create process namespaces ✓ RestrictNamespaces=~net Service cannot create network namespaces ✓ RestrictNamespaces=~uts Service cannot create hostname namespaces ✓ RestrictNamespaces=~mnt Service cannot create file system namespaces ✓ CapabilityBoundingSet=~CAP_LEASE Service cannot create file leases ✓ CapabilityBoundingSet=~CAP_MKNOD Service cannot create device nodes ✓ RestrictNamespaces=~cgroup Service cannot create cgroup namespaces ✓ RestrictNamespaces=~ipc Service cannot create IPC namespaces ✓ ProtectHostname= Service cannot change system host/domainname ✓ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP) Service cannot change file ownership/access mode/capabilities ✓ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP) Service cannot change UID/GID identities/capabilities ✓ LockPersonality= Service cannot change ABI personality ✓ ProtectKernelTunables= Service cannot alter kernel tunables (/proc/sys, …) ✓ RestrictAddressFamilies=~AF_PACKET Service cannot allocate packet sockets ✓ RestrictAddressFamilies=~… Service cannot allocate exotic sockets ✓ CapabilityBoundingSet=~CAP_MAC_* Service cannot adjust SMACK MAC ✓ RestrictSUIDSGID= SUID/SGID file creation by service is restricted ✓ UMask= Files created by service are accessible only by service's own user by default → Overall exposure level for unbound.service: 1.1 OK 🙂