Re: RFC: Adding a dyntrans in systemd pid1's forking

Chris PeBenito <chpebeni@xxxxxxxxxxxxxxxxxxx> · Wed, 13 Nov 2024 14:29:08 -0500

On 11/13/2024 1:07 PM, Kenton Groombridge wrote:
On 24/11/13 11:07AM, Chris PeBenito wrote:
On 11/6/2024 10:21 AM, Chris PeBenito wrote:
I've recently become aware of systemd's credentials feature[1].  In a
nutshell, the intent is to reduce privilege in units by systemd itself
copying the credentials (crypto materials, passwords, though in practice
it could be anything) and placing it in /run/credentials, with access
managed by namespacing.  This is intended to eliminate the need for the
daemon in the unit directly accessing the data.  My concern is the
possibility of inadvertently leaking credentials or abuse.  i.e. putting
in

LoadCredential=foobar:/etc/shadow

This illustrative, as systemd can't read shadow if it's confined, but
you could replace shadow with a private key or any other highly
confidential data systemd needs to access.  The common response to my
concern is systemd units should be protected at high integrity; only
root can modify them, etc.  However, I think we can do better to reduce
the possibility of errors.

I've discussed this with one of the systemd maintainers, and I'm
proposing this change:

1. pid1 forks (to pidN) to run the unit, as usual.
2. pidN does security_compute_create() using the current domain and the
label of the unit to get a new domain.
3. pidN does setcon() to the new domain.
4. pidN runs the unit as per usual (including the credentials stuff)

Then we'd need to add something like this to the policy:

allow init_t httpd_initrc_t:process dyntransition;
type_transition init_t httpd_unit_t:process httpd_initrc_t;

I have not yet prototyped this, but based on my discussion with the
systemd maintainers, this should be doable.  I believe the added benefit
is we can decompose initrc_t's privilege and maybe even reduce init_t's
privilege.

Hearing no objections, I've done an initial implementation:

https://github.com/systemd/systemd/compare/main...pebenito:systemd:pidN-selinux-setcon

If there is no policy in place, it does not incur new denials.  One nice
thing I found is that the unit name is available, so I used that in the
security_compute_create_name_raw() call.  I tested by adding the following
systemd-networkd.service drop-in:

[Service]
LoadCredential=shadow:/etc/shadow

I added the following to the policy:

type systemd_networkd_initrc_t;
domain_type(systemd_networkd_initrc_t)
role system_r types systemd_networkd_initrc_t;
allow init_t self:process setcurrent;
domain_dyntrans_type(init_t)
allow init_t systemd_networkd_initrc_t:process dyntransition;
type_transition init_t systemd_networkd_unit_t:process
systemd_networkd_initrc_t;
domtrans_pattern(systemd_networkd_initrc_t, systemd_networkd_exec_t,
systemd_networkd_t)

These changes resulted in this denial:

Nov 13 15:10:54 azurelinux-vm audit[605]: AVC avc:  denied  { read } for
pid=605 comm="(sd-mkdcreds)" name="shadow" dev="sda2" ino=18058
scontext=system_u:system_r:systemd_networkd_initrc_t:s0
tcontext=system_u:object_r:shadow_t:s0 tclass=file permissive=1

The remaining policy for systemd_networkd_initrc_t would look like (denials
summarized by audit2allow):

[... cut ...]
allow systemd_networkd_initrc_t bin_t:file { execute execute_no_trans getattr map open read };
[... cut ...]

This seems like a very promising way to break up initrc_t, limit privileges,
and prevent administrator errors.  What do you think?

Overall I like the direction this is going! I am curious, though, about
whether this will affect systemd units' ExecStartPre=, ExecStartPost=,
and similar directives.

It would affect Exec* directives in a unit and theoretically any other 
directives that generate a call to exec_invoke() in the systemd code.  I 
placed the setcon() immediately before

One of the problems I have when writing policy for some systemd units is
these directives will run commands normally under init_t instead of the
resulting daemon domain. A unit may, as an example, want to remove a
file on ExecStartPre=, which will run as init_t and therefore will need
to be allowed to do so by policy if the command there is a simple
/bin/rm.

This type of behavior is one of the secondary reasons I've been looking 
in to this.  In refpolicy, if init_t execs a bin_t file, it runs in 
initrc_t.  Then, because of unit-specific Exec* that run bin_t stuf, 
initrc_t gets bloated with permissions which could be (ab)used by other 
units.

Will this be extended to have these commands run under the unit's
corresponding initrc_t domain if there is one? That would solve so much
headache when dealing with these types of units and also allow breaking
up init_t potentially further.

This initial implementation does that; note the bin_t 
execute(_no_trans).   That's the unit running systemd-networkd-wait-online.

--
Chris