Re: Measured systemd-sysext

Lennart Poettering <lennart@xxxxxxxxxxxxxx> · Fri, 24 May 2024 17:44:55 +0200

On Fr, 24.05.24 17:39, Dimitris Karakasilis (dimitris@xxxxxxxxxxxxxx) wrote:

> we (at kairos.io) are trying to understand how systemd-sysext
> extensions can

Hmm, I thought kairos wasn't so fond of systemd?

> also be made tamper-proof by being measured in a system that boots in UKI
> mode.

It's pretty simple: there's no nice support for comprehensively
measuring sysext images right now. There's support for measuring into
PCR 13 the sysext images passed into the UKI, but that's pretty much
it: there's no support for measuring sysexts activated from other
sources and later during runtime.

So there are two issues:

1. Right now we don't really have another PCR to spare. The various
   PCRs systemd measures stuff into right now contain maesurements
   that typically happen only once during boot. That makes them really
   nice for validating/attesting boot success, or to bind policy to
   and so on, as they are relatively stable, they "settle"
   eventually. Measurements of sysexts on activation are different
   from that, after all sysexts are added/removed/updated during
   runtime all the time, hence they probably should be expected to be
   a continuing series of measurements, one for each activation during
   runtime. That makes them nice for attestation, but much less useful
   for binding policy to. Hence, I think there's a strong reason to
   keeping these measurements separate from the existing measurements,
   i.e. place them in a separate PCR – but we have none left.

   Now, TPM2 allows adding new "fake" PCRs via a special type of
   nvindex so that this restriction goes away. It's high on our todo
   list to have an API for "registering" such "fake" PCRs (which would
   mean: allocating the nvindex with an apprpriate locked down policy,
   and then storing information about this somewhere). This should
   probably be placed in systemd-pcrextend@.service (which already
   provides an API to measure arbitrary stuff to arbitrary PCRs, so it
   looks like it would be a nice place to allow measuring arbitrary
   stuff to "fake" PCRs, and allocating them. This is probably not
   particularly involved, but so far noone has worked ont his.

2. The questions is where (in which piece of code) the system
   extensions should be measured. There are two potential places: when
   we activate them, from userspace code. That would be trivial to add for
   us. We have all the internal apis after all. i.e. we could just use
   the aforementioned pcrextend apis once we have them to allocate a
   fake PCR and then immediately measure into them.

   However, what might be nicer would be to measure this in kernel
   space. I was discussing this at last week's LSFMMBPF conference
   with various relevant folks, and one idea we came up with is
   something like this:

   a) introduce a BPF kfunc for TPM measurements in the kernel, so
      that BPF code loaded into the kernel can do measurements. THis
      would require an upstream kernel patch, but the BPF folks seemed
      kinda on board with that.

   b) then put together a small BPF LSM for the Linux kernel that
      hooks into the dm-verity activation, and does two things:
      measures the root hash of the device (plus some metadata such as
      the DM device name), and writes a quick log message into a bpf
      ringbuffer to userspace. Userspace would then read that and
      ensure the log ends up in the measurement logs systemd maintains
      anyway.

   In systemd we already ship and load some BPF LSMs, adding another
   like the above should be relatively straight-forward.

   (Of course, it's a bit more complicated than this, because a BPF
   kfunc that can measure into a PCR is not going to be enough [NB:
   the kernel already has general code to measure into PCRs], after
   all we want to measure into a "fake PCR" nvindex, which the kernel
   has no existing code for yet. Somebody would have to write that
   first, but it should be managable).

Putting this all together (under the assumption we go for the bpf-lsm
option), the codeflow would be something like this:

1. early during boot, systemd allocates a "fake PCR" for dm-verity
   measurements, from userspace

2. it then loads the small BPF LSM that makes sure all dm-verity
   activations are measured, and parameterizes it with the allocated
   fake PCR nvindex.

3. A bpf ringbuffer is kept in place that will receive the measurement
   log from the bpf lsm, and some code in userspace picks the data up
   from there and writes it to the usual measurement log.

And then we should have a really nice, very comprehensive solution.

Work to making this a reality would be very welcome of course.

(Full disclosure: you can use IMA today to measure all dm-verity root
hashes into the IMA logs, but I personally am not a fan of IMA, it's a
complex beast with so many features I find quite questionnable today,
that I'd rather have a much much simpler lsm-bpf as alternative, that
just does this one thing and nothing else. IMA keeps its logs in
kernel memory, unbounded, with no mechanism for rotation, which I
personally find a complete dealbreaker.)

So much about my current ideas regarding all this.

Lennart

--
Lennart Poettering, Berlin