Re: Measured systemd-sysext

Dimitris Karakasilis <dimitris@xxxxxxxxxxxxxx> · Mon, 27 May 2024 10:00:45 +0300

On 24/5/24 18:44, Lennart Poettering wrote:
On Fr, 24.05.24 17:39, Dimitris Karakasilis (dimitris@xxxxxxxxxxxxxx) wrote:

we (at kairos.io) are trying to understand how systemd-sysext
extensions can
Hmm, I thought kairos wasn't so fond of systemd?
Why would you think that? Kairos is distro-agnostic, thus it tries to 
work on openrc based distros as well but the systemd based ones are 
better supported and tested to be honest.
Thanks for the detailed information (below). We are not so familiar with 
these features to be able to contribute an implementation but we'll keep 
an eye on development and contribute in
any way we can.

also be made tamper-proof by being measured in a system that boots in UKI
mode.
It's pretty simple: there's no nice support for comprehensively
measuring sysext images right now. There's support for measuring into
PCR 13 the sysext images passed into the UKI, but that's pretty much
it: there's no support for measuring sysexts activated from other
sources and later during runtime.

So there are two issues:

1. Right now we don't really have another PCR to spare. The various
    PCRs systemd measures stuff into right now contain maesurements
    that typically happen only once during boot. That makes them really
    nice for validating/attesting boot success, or to bind policy to
    and so on, as they are relatively stable, they "settle"
    eventually. Measurements of sysexts on activation are different
    from that, after all sysexts are added/removed/updated during
    runtime all the time, hence they probably should be expected to be
    a continuing series of measurements, one for each activation during
    runtime. That makes them nice for attestation, but much less useful
    for binding policy to. Hence, I think there's a strong reason to
    keeping these measurements separate from the existing measurements,
    i.e. place them in a separate PCR – but we have none left.

    Now, TPM2 allows adding new "fake" PCRs via a special type of
    nvindex so that this restriction goes away. It's high on our todo
    list to have an API for "registering" such "fake" PCRs (which would
    mean: allocating the nvindex with an apprpriate locked down policy,
    and then storing information about this somewhere). This should
    probably be placed in systemd-pcrextend@.service (which already
    provides an API to measure arbitrary stuff to arbitrary PCRs, so it
    looks like it would be a nice place to allow measuring arbitrary
    stuff to "fake" PCRs, and allocating them. This is probably not
    particularly involved, but so far noone has worked ont his.

2. The questions is where (in which piece of code) the system
    extensions should be measured. There are two potential places: when
    we activate them, from userspace code. That would be trivial to add for
    us. We have all the internal apis after all. i.e. we could just use
    the aforementioned pcrextend apis once we have them to allocate a
    fake PCR and then immediately measure into them.

    However, what might be nicer would be to measure this in kernel
    space. I was discussing this at last week's LSFMMBPF conference
    with various relevant folks, and one idea we came up with is
    something like this:

    a) introduce a BPF kfunc for TPM measurements in the kernel, so
       that BPF code loaded into the kernel can do measurements. THis
       would require an upstream kernel patch, but the BPF folks seemed
       kinda on board with that.

    b) then put together a small BPF LSM for the Linux kernel that
       hooks into the dm-verity activation, and does two things:
       measures the root hash of the device (plus some metadata such as
       the DM device name), and writes a quick log message into a bpf
       ringbuffer to userspace. Userspace would then read that and
       ensure the log ends up in the measurement logs systemd maintains
       anyway.

    In systemd we already ship and load some BPF LSMs, adding another
    like the above should be relatively straight-forward.

    (Of course, it's a bit more complicated than this, because a BPF
    kfunc that can measure into a PCR is not going to be enough [NB:
    the kernel already has general code to measure into PCRs], after
    all we want to measure into a "fake PCR" nvindex, which the kernel
    has no existing code for yet. Somebody would have to write that
    first, but it should be managable).

Putting this all together (under the assumption we go for the bpf-lsm
option), the codeflow would be something like this:

1. early during boot, systemd allocates a "fake PCR" for dm-verity
    measurements, from userspace

2. it then loads the small BPF LSM that makes sure all dm-verity
    activations are measured, and parameterizes it with the allocated
    fake PCR nvindex.

3. A bpf ringbuffer is kept in place that will receive the measurement
    log from the bpf lsm, and some code in userspace picks the data up
    from there and writes it to the usual measurement log.

And then we should have a really nice, very comprehensive solution.

Work to making this a reality would be very welcome of course.

(Full disclosure: you can use IMA today to measure all dm-verity root
hashes into the IMA logs, but I personally am not a fan of IMA, it's a
complex beast with so many features I find quite questionnable today,
that I'd rather have a much much simpler lsm-bpf as alternative, that
just does this one thing and nothing else. IMA keeps its logs in
kernel memory, unbounded, with no mechanism for rotation, which I
personally find a complete dealbreaker.)

So much about my current ideas regarding all this.

Lennart

--
Lennart Poettering, Berlin