From: Darrick J. Wong <djwong@xxxxxxxxxx> Currently, xfs_scrub has to run with some elevated privileges. Minimize the risk of xfs_scrub escaping its service container or contaminating the rest of the system by using systemd's sandboxing controls to prohibit as much access as possible. The directives added by this patch were recommended by the command 'systemd-analyze security xfs_scrub@.service' in systemd 249. Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> --- scrub/xfs_scrub@xxxxxxxxxxx | 78 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 70 insertions(+), 8 deletions(-) diff --git a/scrub/xfs_scrub@xxxxxxxxxxx b/scrub/xfs_scrub@xxxxxxxxxxx index 1fafcfd37f1..e306216bb91 100644 --- a/scrub/xfs_scrub@xxxxxxxxxxx +++ b/scrub/xfs_scrub@xxxxxxxxxxx @@ -7,18 +7,16 @@ Description=Online XFS Metadata Check for %f OnFailure=xfs_scrub_fail@%i.service Documentation=man:xfs_scrub(8) +ConditionCapability=CAP_SYS_ADMIN +ConditionCapability=CAP_FOWNER +ConditionCapability=CAP_DAC_OVERRIDE +ConditionCapability=CAP_DAC_READ_SEARCH +ConditionCapability=CAP_SYS_RAWIO [Service] Type=oneshot -PrivateNetwork=true -ProtectSystem=full -ProtectHome=read-only -# Disable private /tmp just in case %f is a path under /tmp. -PrivateTmp=no -AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO -NoNewPrivileges=yes -User=nobody Environment=SERVICE_MODE=1 +Environment=SERVICE_MOUNTPOINT=/tmp/scrub ExecStart=@sbindir@/xfs_scrub @scrub_args@ %f SyslogIdentifier=%N @@ -31,3 +29,67 @@ Nice=19 # Create the service underneath the scrub background service slice so that we # can control resource usage. Slice=system-xfs_scrub.slice + +# No realtime CPU scheduling +RestrictRealtime=true + +# Dynamically create a user that isn't root +DynamicUser=true + +# Make the entire filesystem readonly and /home inaccessible, then bind mount +# the filesystem we're supposed to be checking into our private /tmp dir. +# 'norbind' means that we don't bind anything under that original mount. +ProtectSystem=strict +ProtectHome=yes +PrivateTmp=true +BindPaths=%f:/tmp/scrub:norbind + +# Don't let scrub complain about paths in /etc/projects that have been hidden +# by our sandboxing. scrub doesn't care about project ids anyway. +InaccessiblePaths=-/etc/projects + +# No network access +PrivateNetwork=true +ProtectHostname=true +RestrictAddressFamilies=none +IPAddressDeny=any + +# Don't let the program mess with the kernel configuration at all +ProtectKernelLogs=true +ProtectKernelModules=true +ProtectKernelTunables=true +ProtectControlGroups=true +ProtectProc=invisible +RestrictNamespaces=true + +# Hide everything in /proc, even /proc/mounts +ProcSubset=pid + +# Only allow the default personality Linux +LockPersonality=true + +# No writable memory pages +MemoryDenyWriteExecute=true + +# Don't let our mounts leak out to the host +PrivateMounts=true + +# Restrict system calls to the native arch and only enough to get things going +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@privileged +SystemCallFilter=~@resources +SystemCallFilter=~@mount + +# xfs_scrub needs these privileges to run, and no others +CapabilityBoundingSet=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO +AmbientCapabilities=CAP_SYS_ADMIN CAP_FOWNER CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH CAP_SYS_RAWIO +NoNewPrivileges=true + +# xfs_scrub doesn't create files +UMask=7777 + +# No access to hardware /dev files except for block devices +ProtectClock=true +DevicePolicy=closed +DeviceAllow=block-*