On 24/5/24 18:44, Lennart Poettering wrote:
On Fr, 24.05.24 17:39, Dimitris Karakasilis (dimitris@xxxxxxxxxxxxxx) wrote:
we (at kairos.io) are trying to understand how systemd-sysext
extensions can
Hmm, I thought kairos wasn't so fond of systemd?
Why would you think that? Kairos is distro-agnostic, thus it tries to
work on openrc based distros as well but the systemd based ones are
better supported and tested to be honest.
Thanks for the detailed information (below). We are not so familiar with
these features to be able to contribute an implementation but we'll keep
an eye on development and contribute in
any way we can.
also be made tamper-proof by being measured in a system that boots in UKI
mode.
It's pretty simple: there's no nice support for comprehensively
measuring sysext images right now. There's support for measuring into
PCR 13 the sysext images passed into the UKI, but that's pretty much
it: there's no support for measuring sysexts activated from other
sources and later during runtime.
So there are two issues:
1. Right now we don't really have another PCR to spare. The various
PCRs systemd measures stuff into right now contain maesurements
that typically happen only once during boot. That makes them really
nice for validating/attesting boot success, or to bind policy to
and so on, as they are relatively stable, they "settle"
eventually. Measurements of sysexts on activation are different
from that, after all sysexts are added/removed/updated during
runtime all the time, hence they probably should be expected to be
a continuing series of measurements, one for each activation during
runtime. That makes them nice for attestation, but much less useful
for binding policy to. Hence, I think there's a strong reason to
keeping these measurements separate from the existing measurements,
i.e. place them in a separate PCR – but we have none left.
Now, TPM2 allows adding new "fake" PCRs via a special type of
nvindex so that this restriction goes away. It's high on our todo
list to have an API for "registering" such "fake" PCRs (which would
mean: allocating the nvindex with an apprpriate locked down policy,
and then storing information about this somewhere). This should
probably be placed in systemd-pcrextend@.service (which already
provides an API to measure arbitrary stuff to arbitrary PCRs, so it
looks like it would be a nice place to allow measuring arbitrary
stuff to "fake" PCRs, and allocating them. This is probably not
particularly involved, but so far noone has worked ont his.
2. The questions is where (in which piece of code) the system
extensions should be measured. There are two potential places: when
we activate them, from userspace code. That would be trivial to add for
us. We have all the internal apis after all. i.e. we could just use
the aforementioned pcrextend apis once we have them to allocate a
fake PCR and then immediately measure into them.
However, what might be nicer would be to measure this in kernel
space. I was discussing this at last week's LSFMMBPF conference
with various relevant folks, and one idea we came up with is
something like this:
a) introduce a BPF kfunc for TPM measurements in the kernel, so
that BPF code loaded into the kernel can do measurements. THis
would require an upstream kernel patch, but the BPF folks seemed
kinda on board with that.
b) then put together a small BPF LSM for the Linux kernel that
hooks into the dm-verity activation, and does two things:
measures the root hash of the device (plus some metadata such as
the DM device name), and writes a quick log message into a bpf
ringbuffer to userspace. Userspace would then read that and
ensure the log ends up in the measurement logs systemd maintains
anyway.
In systemd we already ship and load some BPF LSMs, adding another
like the above should be relatively straight-forward.
(Of course, it's a bit more complicated than this, because a BPF
kfunc that can measure into a PCR is not going to be enough [NB:
the kernel already has general code to measure into PCRs], after
all we want to measure into a "fake PCR" nvindex, which the kernel
has no existing code for yet. Somebody would have to write that
first, but it should be managable).
Putting this all together (under the assumption we go for the bpf-lsm
option), the codeflow would be something like this:
1. early during boot, systemd allocates a "fake PCR" for dm-verity
measurements, from userspace
2. it then loads the small BPF LSM that makes sure all dm-verity
activations are measured, and parameterizes it with the allocated
fake PCR nvindex.
3. A bpf ringbuffer is kept in place that will receive the measurement
log from the bpf lsm, and some code in userspace picks the data up
from there and writes it to the usual measurement log.
And then we should have a really nice, very comprehensive solution.
Work to making this a reality would be very welcome of course.
(Full disclosure: you can use IMA today to measure all dm-verity root
hashes into the IMA logs, but I personally am not a fan of IMA, it's a
complex beast with so many features I find quite questionnable today,
that I'd rather have a much much simpler lsm-bpf as alternative, that
just does this one thing and nothing else. IMA keeps its logs in
kernel memory, unbounded, with no mechanism for rotation, which I
personally find a complete dealbreaker.)
So much about my current ideas regarding all this.
Lennart
--
Lennart Poettering, Berlin