RuntimeDirectory for ephemeral chroot environment

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear list,

I have a question about the intended use of the RuntimeDirectory
directive regarding the use for, what I'd call, ephemeral chroot
environments.  Also I would like some clarification about
RestrictAddressFamilies, but that is only related in that it happened to
come up when hardening a socket-activated service unit, so if it is
better to handle that in a separate thread, just tell me and I will
create one.

A bit of context:
I only recently found out about `systemd-analyze security` and used it
to harden `unbound.service` which comes with Ubuntu 24.04 LTS. And while
I was at it I came up with a way to get a chroot environment whose
lifetime is limited to the runtime of the service itself, like so:

	[Service]
	DynamicUser=yes

	# Only confine ExecStart=, so we need neither Bind* unbound-control nor the unix socket
	RootDirectoryStartOnly=yes
	# Use systemd's chroot capabilities
	RootDirectory=%t/%N
	# put it in a runtime directory so it leaves no trace after exit and need not exist
	# this should be the same path as above
	RuntimeDirectory=%N
	# But it is not an actual runtime directory meant to be writeable
	UnsetEnvironment=RUNTIME_DIRECTORY
	ReadOnlyPaths=+%t/%N

For reference see the attached full output of
`systemctl cat unbound.service` in the attachments. I hope the comments
carry the intent. The idea is to have the chroot created when the
service starts and destroyed when it exits. There seems to be no other
way to do this than to (ab)use `RuntimeDirectory=` as shown above. I've
tried this with `TeporaryFileSystem` but that requires that the
directory already exists.

So my main concern is if there are any side effects I may not be aware
of that might come back to bite me. So far this unit runs like a charm
with the least privileges I could manage to get working.
`systemd-analyze security` shows an exposure level of "1.1 OK", so
that's pretty good. And thanks by the way for this tool, it is great for
finding things I, as well as upstream and/or Canonical, was not even
aware of. I am even considering sending these as enhancements to one or
both of the latter. But that depends on what you, the experts, tell me.

My rationale for doing this is basically that this way the unit is
basically self-contained in a way, so there is nothing that needs to
happen outside to set up it's runtime environment.
The `UnsetEnvironment=RUNTIME_DIRECTORY` and
`ReadOnlyPaths=+%t/%N` may be unnecessary since I don't expect unbound
to be using those anyway, but then again this is about hardening and a
compromised service may be able to make use of this, was my thinking
there. I was unable, however, to actually see the contents of `+%t/%N`,
if any, since I wanted to know if there is a loop situation, since that
would point to the RootDirectory. I am not quite sure why I could not
see anything in there, though, but am suspecting that this has to do
with it being `unshare`d. Additional info on this would be very welcome
too, even if it's just an RTFM pointer; the documentation is kind of
overwhelming, after all.
I hope this covers my main question.

As for the `RestrictAdressFamilies` directive, I want to know if it is
even possible to restrict AF_{NETLINK,UNIX,INET,INET6} when a service is
socket-activated. I somehow got the idea in my head that the service
executable should not need to do any binding itself since that should
have happened already by starting the corresponding socket unit. But
this seems impossible with unbound. Or am I misunderstanding how this
works? Is that at least theoretically possible? I am half suspecting
that it is but that unbound does not support this since
systemd-integration seems to be an afterthought. So if this is only
because of some missing integration on unbound's part I would like to
get that upstreamed, because then I could check three more boxes in
`systemd-analyze security`.

For reference I am attaching the output `systemctl cat
unbound.{service,socket}` and `systemd-analyze security unbound.service`

`systemd --version`:
systemd 255 (255.4-1ubuntu8.5) +PAM +AUDIT +SELINUX +APPARMOR +IMA
+SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS
+FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY
+P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK
-XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified


Thanks for your time and, of course, systemd!
Peter
# /etc/systemd/system/unbound.socket
# /etc/systemd/system/unbound.service
[Unit]
Description=Socket(s) (including control) for Unbound DNS server
Documentation=man:unbound(8)
DefaultDependencies=no
After=systemd-sysusers.service
Requires=sysinit.target
Conflicts=shutdown.target
#Before=systemd-resolved.service sysinit.target network.target nss-lookup.target shutdown.target
Before=systemd-resolved.service nss-lookup.target shutdown.target

[Socket]
ListenDatagram=127.0.0.1:53
ListenStream=127.0.0.1:53
ListenStream=%t/unbound-control/%N.ctl
SocketGroup=%N
SocketMode=0660

[Install]
WantedBy=sockets.target
# /etc/systemd/system/unbound.service
[Unit]
Description=Unbound DNS server
Documentation=man:unbound(8)
After=network.target
Before=nss-lookup.target
Wants=nss-lookup.target

[Service]
Type=notify
Restart=on-failure
EnvironmentFile=-/etc/default/unbound
ExecStartPre=-/usr/libexec/unbound-helper chroot_setup
ExecStartPre=-/usr/libexec/unbound-helper root_trust_anchor_update
ExecStart=/usr/sbin/unbound -d -p $DAEMON_OPTS
ExecStopPost=-/usr/libexec/unbound-helper chroot_teardown
ExecReload=+/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/unbound.service.d/override-hardening.conf

# /etc/systemd/system/unbound.service.d/override.conf
# This overrides the service unit that comes with Ubuntu in an effort to
# maximize security.
[Unit]
# For verifying keys, time should be synced first
After=time-sync.target

[Service]
# do not run chroot helper
ExecStartPre=
# This is done by external service unit, see additional *-hardening.conf
# ExecStartPre=-/usr/libexec/unbound-helper root_trust_anchor_update

ExecStart=
# Don't daemonize and send output to stderr for journald
ExecStart=unbound -ddp $DAEMON_OPTS
# No daemonizing and hence no PID file neccessary, see -d and -p in unbound(8)
PIDFile=
# No chroot teardown helper necessary
ExecStopPost=

ExecReload=
# This at least waits for a reply in contrast to just sending HUP
ExecReload=unbound-control -q reload

[Install]
# Don't install any dependencies since we rely solely on socket activation
WantedBy=
Also=%N.socket

# /etc/systemd/system/unbound.service.d/zz-override-hardening.conf
# This overrides the service unit that comes with Ubuntu in an effort to
# maximize security.
[Unit]
# Use external bootstrapping b/c this one is too restrictive
Requires=%N-trust-anchor-update.service
After=%N-trust-anchor-update.service
# Double check if it worked
AssertPathExists=%S/%N/root.key

# Hard dependency on socket because we have no privileges
BindsTo=%N.socket
#After=<unnecessary b/c sockets have implicit Before=>

[Service]
DynamicUser=yes

# Only confine ExecStart=, so we need neither Bind* unbound-control nor the unix socket
RootDirectoryStartOnly=yes
# Use systemd's chroot capabilities
RootDirectory=%t/%N
# put it in a runtime directory so it leaves no trace after exit and need not exist
# this should be the same path as above
RuntimeDirectory=%N
# But it is not an actual runtime directory meant to be writeable
UnsetEnvironment=RUNTIME_DIRECTORY
ReadOnlyPaths=+%t/%N

# root.key lives in StateDirectory, i.e. /var/lib/unbound or /var/lib/private/unbound, in
# case of running as dynamic user
StateDirectory=%N
ConfigurationDirectory=%N

## Binaries
# need unbound-control for reload action and the control socket
#BindReadOnlyPaths=/usr/sbin/unbound /usr/sbin/unbound-control %t/%N-control/%N.ctl
BindReadOnlyPaths=/usr/sbin/unbound

# For some reason unbound needs access to this too
# lest it complain about systemd not running?
BindReadOnlyPaths=%t/systemd/system

BindReadOnlyPaths=/etc/ssl/certs/ca-certificates.crt

## required shared objects
# linker
BindReadOnlyPaths=/lib64/ld-linux-x86-64.so.2
# printf 'BindReadOnlyPaths=%s\n' $(ldd $(command which unbound) |
#	sed -nE 's/.* => (.*) \([^)]*\)$/\1/p'
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libc.so.6
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libcap.so.2
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libcrypto.so.3
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libevent-2.1.so.7
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libexpat.so.1
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libgcrypt.so.20
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libgpg-error.so.0
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libhiredis.so.1.1.0
BindReadOnlyPaths=/lib/x86_64-linux-gnu/liblz4.so.1
BindReadOnlyPaths=/lib/x86_64-linux-gnu/liblzma.so.5
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libm.so.6
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libnghttp2.so.14
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libprotobuf-c.so.1
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libpython3.12.so.1.0
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libssl.so.3
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libsystemd.so.0
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libz.so.1
BindReadOnlyPaths=/lib/x86_64-linux-gnu/libzstd.so.1

ProtectClock=yes
# Implied by DynamicUser=yes
ProtectSystem=strict
ProtectHome=yes

PrivateDevices=yes
ProtectKernelTunables=yes
ProtectControlGroups=yes
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 AF_NETLINK
CapabilityBoundingSet=
SystemCallFilter=~@clock @cpu-emulation @debug @module @mount @obsolete @privileged @raw-io @reboot @resources @swap
#SystemCallFilter=~@clock @cpu-emulation @debug @module @mount @obsolete @raw-io @reboot @resources @swap
SystemCallArchitectures=native
# only necessary without socket activation and unbound doing chroot
#CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_SETUID CAP_SETGID CAP_CHOWN
LockPersonality=yes
NoNewPrivileges=yes
PrivateUsers=yes
ProtectHostname=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectProc=invisible
MountAPIVFS=yes
ProcSubset=pid
MemoryDenyWriteExecute=yes

RestrictRealtime=yes
RestrictNamespaces=yes
UMask=077


[Install]
# Don't install any dependencies since we rely solely on socket activation
WantedBy=
  NAME                                                        DESCRIPTION                                                                   EXPOSURE
✓ SystemCallFilter=~@swap                                     System call deny list defined for service, and @swap is included              
✓ SystemCallFilter=~@resources                                System call deny list defined for service, and @resources is included         
✓ SystemCallFilter=~@reboot                                   System call deny list defined for service, and @reboot is included            
✓ SystemCallFilter=~@raw-io                                   System call deny list defined for service, and @raw-io is included            
✓ SystemCallFilter=~@privileged                               System call deny list defined for service, and @privileged is included        
✓ SystemCallFilter=~@obsolete                                 System call deny list defined for service, and @obsolete is included          
✓ SystemCallFilter=~@mount                                    System call deny list defined for service, and @mount is included             
✓ SystemCallFilter=~@module                                   System call deny list defined for service, and @module is included            
✓ SystemCallFilter=~@debug                                    System call deny list defined for service, and @debug is included             
✓ SystemCallFilter=~@cpu-emulation                            System call deny list defined for service, and @cpu-emulation is included     
✓ SystemCallFilter=~@clock                                    System call deny list defined for service, and @clock is included             
✓ RemoveIPC=                                                  Service user cannot leave SysV IPC objects around                             
✓ User=/DynamicUser=                                          Service runs under a transient non-root user identity                         
✓ RestrictRealtime=                                           Service realtime scheduling access is restricted                              
✓ CapabilityBoundingSet=~CAP_SYS_TIME                         Service processes cannot change the system clock                              
✓ NoNewPrivileges=                                            Service processes cannot acquire new privileges                               
✓ AmbientCapabilities=                                        Service process does not receive ambient capabilities                         
✓ CapabilityBoundingSet=~CAP_BPF                              Service may load BPF programs                                                 
✓ SystemCallArchitectures=                                    Service may execute system calls only with native ABI                         
✗ RestrictAddressFamilies=~AF_NETLINK                         Service may allocate netlink sockets                                               0.1
✗ RestrictAddressFamilies=~AF_UNIX                            Service may allocate local sockets                                                 0.1
✗ RestrictAddressFamilies=~AF_(INET|INET6)                    Service may allocate Internet sockets                                              0.3
✓ ProtectSystem=                                              Service has strict read-only access to the OS file hierarchy                  
✓ ProtectProc=                                                Service has restricted access to process tree (/proc hidepid=)                
✓ SupplementaryGroups=                                        Service has no supplementary groups                                           
✓ CapabilityBoundingSet=~CAP_SYS_RAWIO                        Service has no raw I/O access                                                 
✓ CapabilityBoundingSet=~CAP_SYS_PTRACE                       Service has no ptrace() debugging abilities                                   
✓ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE)              Service has no privileges to change resource use parameters                   
✓ CapabilityBoundingSet=~CAP_NET_ADMIN                        Service has no network configuration privileges                               
✓ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has no elevated networking privileges                                 
✓ CapabilityBoundingSet=~CAP_AUDIT_*                          Service has no audit subsystem access                                         
✓ CapabilityBoundingSet=~CAP_SYS_ADMIN                        Service has no administrator privileges                                       
✓ PrivateTmp=                                                 Service has no access to other software's temporary files                     
✓ ProcSubset=                                                 Service has no access to non-process /proc files (/proc subset=)              
✓ CapabilityBoundingSet=~CAP_SYSLOG                           Service has no access to kernel logging                                       
✓ ProtectHome=                                                Service has no access to home directories                                     
✓ PrivateDevices=                                             Service has no access to hardware devices                                     
✓ RootDirectory=/RootImage=                                   Service has its own root directory/image                                      
✗ PrivateNetwork=                                             Service has access to the host's network                                           0.5
✗ DeviceAllow=                                                Service has a device ACL with some special devices: char-rtc:r                     0.1
✓ KeyringMode=                                                Service doesn't share key material with other services                        
✓ Delegate=                                                   Service does not maintain its own delegated control group subtree             
✓ PrivateUsers=                                               Service does not have access to other users                                   
✗ IPAddressDeny=                                              Service does not define an IP address allow list                                   0.2
✓ NotifyAccess=                                               Service child processes cannot alter service state                            
✓ ProtectClock=                                               Service cannot write to the hardware clock or system clock                    
✓ CapabilityBoundingSet=~CAP_SYS_PACCT                        Service cannot use acct()                                                     
✓ CapabilityBoundingSet=~CAP_KILL                             Service cannot send UNIX signals to arbitrary processes                       
✓ ProtectKernelLogs=                                          Service cannot read from or write to the kernel log ring buffer               
✓ CapabilityBoundingSet=~CAP_WAKE_ALARM                       Service cannot program timers that wake up the system                         
✓ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)         Service cannot override UNIX file/IPC permission checks                       
✓ ProtectControlGroups=                                       Service cannot modify the control group file system                           
✓ CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE                  Service cannot mark files immutable                                           
✓ CapabilityBoundingSet=~CAP_IPC_LOCK                         Service cannot lock memory into RAM                                           
✓ ProtectKernelModules=                                       Service cannot load or read kernel modules                                    
✓ CapabilityBoundingSet=~CAP_SYS_MODULE                       Service cannot load kernel modules                                            
✓ CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG                   Service cannot issue vhangup()                                                
✓ CapabilityBoundingSet=~CAP_SYS_BOOT                         Service cannot issue reboot()                                                 
✓ CapabilityBoundingSet=~CAP_SYS_CHROOT                       Service cannot issue chroot()                                                 
✓ PrivateMounts=                                              Service cannot install system mounts                                          
✓ CapabilityBoundingSet=~CAP_BLOCK_SUSPEND                    Service cannot establish wake locks                                           
✓ MemoryDenyWriteExecute=                                     Service cannot create writable executable memory mappings                     
✓ RestrictNamespaces=~user                                    Service cannot create user namespaces                                         
✓ RestrictNamespaces=~pid                                     Service cannot create process namespaces                                      
✓ RestrictNamespaces=~net                                     Service cannot create network namespaces                                      
✓ RestrictNamespaces=~uts                                     Service cannot create hostname namespaces                                     
✓ RestrictNamespaces=~mnt                                     Service cannot create file system namespaces                                  
✓ CapabilityBoundingSet=~CAP_LEASE                            Service cannot create file leases                                             
✓ CapabilityBoundingSet=~CAP_MKNOD                            Service cannot create device nodes                                            
✓ RestrictNamespaces=~cgroup                                  Service cannot create cgroup namespaces                                       
✓ RestrictNamespaces=~ipc                                     Service cannot create IPC namespaces                                          
✓ ProtectHostname=                                            Service cannot change system host/domainname                                  
✓ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP)           Service cannot change file ownership/access mode/capabilities                 
✓ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP)                Service cannot change UID/GID identities/capabilities                         
✓ LockPersonality=                                            Service cannot change ABI personality                                         
✓ ProtectKernelTunables=                                      Service cannot alter kernel tunables (/proc/sys, …)                           
✓ RestrictAddressFamilies=~AF_PACKET                          Service cannot allocate packet sockets                                        
✓ RestrictAddressFamilies=~…                                  Service cannot allocate exotic sockets                                        
✓ CapabilityBoundingSet=~CAP_MAC_*                            Service cannot adjust SMACK MAC                                               
✓ RestrictSUIDSGID=                                           SUID/SGID file creation by service is restricted                              
✓ UMask=                                                      Files created by service are accessible only by service's own user by default 

→ Overall exposure level for unbound.service: 1.1 OK 🙂

[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux