On Fr, 24.05.24 17:39, Dimitris Karakasilis (dimitris@xxxxxxxxxxxxxx) wrote: > we (at kairos.io) are trying to understand how systemd-sysext > extensions can Hmm, I thought kairos wasn't so fond of systemd? > also be made tamper-proof by being measured in a system that boots in UKI > mode. It's pretty simple: there's no nice support for comprehensively measuring sysext images right now. There's support for measuring into PCR 13 the sysext images passed into the UKI, but that's pretty much it: there's no support for measuring sysexts activated from other sources and later during runtime. So there are two issues: 1. Right now we don't really have another PCR to spare. The various PCRs systemd measures stuff into right now contain maesurements that typically happen only once during boot. That makes them really nice for validating/attesting boot success, or to bind policy to and so on, as they are relatively stable, they "settle" eventually. Measurements of sysexts on activation are different from that, after all sysexts are added/removed/updated during runtime all the time, hence they probably should be expected to be a continuing series of measurements, one for each activation during runtime. That makes them nice for attestation, but much less useful for binding policy to. Hence, I think there's a strong reason to keeping these measurements separate from the existing measurements, i.e. place them in a separate PCR – but we have none left. Now, TPM2 allows adding new "fake" PCRs via a special type of nvindex so that this restriction goes away. It's high on our todo list to have an API for "registering" such "fake" PCRs (which would mean: allocating the nvindex with an apprpriate locked down policy, and then storing information about this somewhere). This should probably be placed in systemd-pcrextend@.service (which already provides an API to measure arbitrary stuff to arbitrary PCRs, so it looks like it would be a nice place to allow measuring arbitrary stuff to "fake" PCRs, and allocating them. This is probably not particularly involved, but so far noone has worked ont his. 2. The questions is where (in which piece of code) the system extensions should be measured. There are two potential places: when we activate them, from userspace code. That would be trivial to add for us. We have all the internal apis after all. i.e. we could just use the aforementioned pcrextend apis once we have them to allocate a fake PCR and then immediately measure into them. However, what might be nicer would be to measure this in kernel space. I was discussing this at last week's LSFMMBPF conference with various relevant folks, and one idea we came up with is something like this: a) introduce a BPF kfunc for TPM measurements in the kernel, so that BPF code loaded into the kernel can do measurements. THis would require an upstream kernel patch, but the BPF folks seemed kinda on board with that. b) then put together a small BPF LSM for the Linux kernel that hooks into the dm-verity activation, and does two things: measures the root hash of the device (plus some metadata such as the DM device name), and writes a quick log message into a bpf ringbuffer to userspace. Userspace would then read that and ensure the log ends up in the measurement logs systemd maintains anyway. In systemd we already ship and load some BPF LSMs, adding another like the above should be relatively straight-forward. (Of course, it's a bit more complicated than this, because a BPF kfunc that can measure into a PCR is not going to be enough [NB: the kernel already has general code to measure into PCRs], after all we want to measure into a "fake PCR" nvindex, which the kernel has no existing code for yet. Somebody would have to write that first, but it should be managable). Putting this all together (under the assumption we go for the bpf-lsm option), the codeflow would be something like this: 1. early during boot, systemd allocates a "fake PCR" for dm-verity measurements, from userspace 2. it then loads the small BPF LSM that makes sure all dm-verity activations are measured, and parameterizes it with the allocated fake PCR nvindex. 3. A bpf ringbuffer is kept in place that will receive the measurement log from the bpf lsm, and some code in userspace picks the data up from there and writes it to the usual measurement log. And then we should have a really nice, very comprehensive solution. Work to making this a reality would be very welcome of course. (Full disclosure: you can use IMA today to measure all dm-verity root hashes into the IMA logs, but I personally am not a fan of IMA, it's a complex beast with so many features I find quite questionnable today, that I'd rather have a much much simpler lsm-bpf as alternative, that just does this one thing and nothing else. IMA keeps its logs in kernel memory, unbounded, with no mechanism for rotation, which I personally find a complete dealbreaker.) So much about my current ideas regarding all this. Lennart -- Lennart Poettering, Berlin