Systemd socket labeling issue

Daniel Burgener <dburgener@xxxxxxxxxxxxxxxxxxx> · Tue, 3 Dec 2024 15:59:05 -0500

We've recently noticed an issue with how systemd handles SELinux 
labeling for sockets.

In the common case, systemd checks the label of the binary it expects to 
execute, then calls security_compute_create_raw() to determine the label 
of the process it will create, and applies that label to the socket 
using setsockcreatecon().  This makes sense as it matches the label the 
socket would get if the process created it itself.

However, when certain systemd directives are set, such as RootImage= or 
ExtensionImage=, systemd simply skips the above behavior and creates 
sockets without any special labeling handling, so they inherit the label 
of systemd (typically init_t):

https://github.com/systemd/systemd/blob/13a42b776db9f4bd1e827091b6640801c54304e0/src/core/service.c#L5483-L5486

The result is that socket labels end up either with the label or the 
process or the label of systemd, based on unrelated systemd directive 
changes.  Additionally, the init_t label prevents policy authors from 
controlling access granularly on these sockets.

On most upstream policies this ends up working functionally.  Fedora 
added a "temporary" workaround to allow the init_t access for all init_t 
daemons back in 2010 and never removed it:

https://github.com/fedora-selinux/selinux-policy/blob/8dfcddb1f7227bbdf98776f795be53cf50734b04/policy/modules/system/init.te#L604-L605

That workaround accidentally got pulled into refpolicy in a large block 
of systemd changes back in 2017:

https://github.com/SELinuxProject/refpolicy/blob/6e54a2eda6f493c585a3fc59e8ddc54f341dbf0c/policy/modules/system/init.te#L1600-L1601

So in practice a lot of upstream policies are allowing access either 
way, preventing functional issues.

We've spoken with a few systemd maintainers internally and they have 
indicated that there is a fundamental timing issue with the current 
approach - there are use cases where the socket must be available prior 
to the image that contains the binary, so determining the label of the 
binary prior to socket creation is impossible.

* The current approach of applying the label of the resulting process 
seems impossible to do in all cases from a systemd perspective
* Reading the expected binary label from the file_contexts would avoid 
the timing issue, but assumes a system where the binary labels generally 
match the file_contexts
* Inheriting the init_t label prevents security enforcement across 
different systemd created sockets, and conflates IPC with systemd with 
IPC with systemd spawned processes
* Setting some other static label for all sockets avoids the conflation 
between systemd and its children, but not between various children
* Checking the file_contexts for the path of the socket makes a lot of 
sense in the case where sockets have paths, but systemd supports 
creating sockets without paths such as abstract unix sockets (for example)
* Using the SELinuxContext= systemd directive causes systemd to use that 
label for the resulting process (and therefore socket), so it skips 
checking the binary and socket labeling works.  However, this scatters 
policy details across unit files, and doesn't permit decoupling unit 
files and policy.  Not to mention that it's unintuitive to expect anyone 
to know that when they use RootImage= or ExtensionImage= they must also 
use SELinuxContext= or their sockets will be mislabeled.

We're curious for the communities thoughts here.  Any ideas or 
suggestions for how we might address this situation?

-Daniel