The 'systemd-analyze security' command looks at the unit file configuration and reports on any settings which increase the attack surface for the daemon. Since most systemd units are fairly minimalist, this is generally informing us about settings that we never put any thought into using before. In its current configuration it reports # systemd-analyze security virtlogd.service ...snip... → Overall exposure level for virtlogd.service: 9.6 UNSAFE 😨 which is pretty terrible as a score. If we apply all of the recommendations that appear possible without (knowingly) breaking functionality it reports: # systemd-analyze security virtlogd.service ...snip... → Overall exposure level for virtlogd.service: 2.2 OK 🙂 which is a pretty decent improvement. Some of the settings we would like to enable require a systemd version that is newer than that available in our oldest distro target - RHEL-8 at v239. NB, RestrictSUIDSGID is technically newer than 239, but RHEL-8 backported it, and other distros we target have it by default. Remaining recommendations are ✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER) We block FOWNER/IPC_OWNER, but can't block the two DAC capabilities. Historically apps/users might point QEMU to log files in $HOME, pre-created with their own user ID. ✗ IPAddressDeny= Not required since RestrictAddressFamilies blocks IP usage. Ignoring this avoids the overhead of creating a traffic filter than will never be used. ✗ NoNewPrivileges= Highly desirable, but cannot enable it yet, because it will block the ability to transition to the virtlogd_t SELinux domain during execve. The SELinux policy needs fixing to permit this transition under NNP first. ✗ PrivateTmp= There is a decent chance people have VMs configured with a serial port logfile pointing at /tmp. We would cause a regression to use private /tmp for logging ✗ PrivateUsers= This would put virtlogd inside a user namespace where its root is in fact unprivileged. Same problem as the User= setting below ✗ ProcSubset= Libraries we link to might read certain non-PID related files from /proc ✗ ProtectClock= Requires v245 ✗ ProtectHome= Same problem as PrivateTmp=. There's a decent chance that someone has a VM configured to write a logfile to /home ✗ ProtectHostname= Requires v241 ✗ ProtectKernelLogs Requires v244 ✗ ProtectProc Requires v247 ✗ ProtectSystem= We only set it to 'full', as 'strict' is not viable for our required usage ✗ RootDirectory=/RootImage= We are not capable of running inside a custom chroot given needs to write log files to arbitrary places ✗ RestrictAddressFamilies=~AF_UNIX We need AF_UNIX to communicate with other libvirt daemons ✗ SystemCallFilter=~@resources We link to libvirt.so which links to libnuma.so which has a constructor that calls set_mempolicy. This is highly undesirable todo during a constructor. ✗ User=/DynamicUser= This is highly desirable, but we currently read/write logs as root, and directories we're told to write into could be anywhere. So using a non-root user would have a major risk of regressions for applications and also have upgrade implications Signed-off-by: Daniel P. Berrangé <berrange@xxxxxxxxxx> --- src/logging/virtlogd.service.in | 94 +++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/src/logging/virtlogd.service.in b/src/logging/virtlogd.service.in index 8e245ddb43..9e3838ff34 100644 --- a/src/logging/virtlogd.service.in +++ b/src/logging/virtlogd.service.in @@ -20,5 +20,99 @@ OOMScoreAdjust=-900 # per systemd recommendations LimitNOFILE=1024:524288 +CapabilityBoundingSet=~CAP_AUDIT_CONTROL +CapabilityBoundingSet=~CAP_AUDIT_READ +CapabilityBoundingSet=~CAP_AUDIT_WRITE +CapabilityBoundingSet=~CAP_BLOCK_SUSPEND +CapabilityBoundingSet=~CAP_CHOWN +# Mgmt app/user might have pre-created log files that we're +# told to open and write to, or be storing them in otherwise +# inaccessible locations like $HOME. So we need to ignore +# DAC permission checks. +#CapabilityBoundingSet=~CAP_DAC_OVERRIDE +#CapabilityBoundingSet=~CAP_DAC_READ_SEARCH +CapabilityBoundingSet=~CAP_FOWNER +CapabilityBoundingSet=~CAP_FSETID +CapabilityBoundingSet=~CAP_IPC_LOCK +CapabilityBoundingSet=~CAP_IPC_OWNER +CapabilityBoundingSet=~CAP_KILL +CapabilityBoundingSet=~CAP_LEASE +CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE +CapabilityBoundingSet=~CAP_MAC_ADMIN +CapabilityBoundingSet=~CAP_MAC_OVERRIDE +CapabilityBoundingSet=~CAP_MKNOD +CapabilityBoundingSet=~CAP_NET_ADMIN +CapabilityBoundingSet=~CAP_NET_BIND_SERVICE +CapabilityBoundingSet=~CAP_NET_BROADCAST +CapabilityBoundingSet=~CAP_NET_RAW +CapabilityBoundingSet=~CAP_SETFCAP +CapabilityBoundingSet=~CAP_SETPCAP +CapabilityBoundingSet=~CAP_SETGID +CapabilityBoundingSet=~CAP_SETUID +CapabilityBoundingSet=~CAP_SYSLOG +CapabilityBoundingSet=~CAP_SYS_ADMIN +CapabilityBoundingSet=~CAP_SYS_BOOT +CapabilityBoundingSet=~CAP_SYS_CHROOT +CapabilityBoundingSet=~CAP_SYS_MODULE +CapabilityBoundingSet=~CAP_SYS_NICE +CapabilityBoundingSet=~CAP_SYS_PACCT +CapabilityBoundingSet=~CAP_SYS_PTRACE +CapabilityBoundingSet=~CAP_SYS_RAWIO +CapabilityBoundingSet=~CAP_SYS_RESOURCE +CapabilityBoundingSet=~CAP_SYS_TIME +CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG +CapabilityBoundingSet=~CAP_WAKE_ALARM + +LockPersonality=true +MemoryDenyWriteExecute=true +# Cannot enable this as it prevents transitioning to +# the confined SELinux virtlogd_t domain on execve +# unless we modify the policy to allow this. +#NoNewPrivileges=true +PrivateDevices=true +PrivateMounts=true +PrivateNetwork=true +# XXX someone could configure QEMU to log a serial port to an +# arbitrary directory, including /tmp, even if this is ill-advised +#PrivateTmp=true +# Not until oldest build target has systemd >= v245 +#ProtectClock=true +ProtectControlGroups=true +# Not until oldest build target has systemd >= v241 +#ProtectHostname=true +# Not until oldest build target has systemd >= v244 +#ProtectKernelLogs=true +ProtectKernelModules=true +ProtectKernelTunables=true +# Not until oldest build target has systemd >= v247 +#ProtectProc=invisible +ProtectSystem=full +RestrictAddressFamilies=AF_UNIX +RestrictNamespaces=~cgroup +RestrictNamespaces=~ipc +RestrictNamespaces=~mnt +RestrictNamespaces=~net +RestrictNamespaces=~pid +RestrictNamespaces=~user +RestrictNamespaces=~uts +RestrictRealtime=true +RestrictSUIDSGID=true +SystemCallArchitectures=native +SystemCallFilter=~@clock +SystemCallFilter=~@debug +SystemCallFilter=~@module +SystemCallFilter=~@mount +SystemCallFilter=~@raw-io +SystemCallFilter=~@reboot +SystemCallFilter=~@swap +SystemCallFilter=~@privileged +# Unfortunately we link to libnuma via libvirt.so which +# has a constructor that runs unconditionally that invokes +# set_mempolicy() +#SystemCallFilter=~@resources +SystemCallFilter=~@cpu-emulation +SystemCallFilter=~@obsolete +UMask=077 + [Install] Also=virtlogd.socket -- 2.41.0