On Fr, 07.06.24 14:09, Mikko Rapeli (mikko.rapeli@xxxxxxxxxx) wrote: > > How is this supposed to work anyway? is the supplicant supposed to > > exit before initd transition, and be started anew after the > > transition? > > Yes, and tee-supplicant must be started again before any of the TPM using services. > This now works for initrd start and also shutdown, but fails in main rootfs > where services like systemd-pcrmachine.service, systemd-tpm2-setup.service and > systemd-pcrfs-root.service fail since TPM device is not functional without > tee-supplicant in userspace. So how do you enqueue tpm2.target again? Via the unmodified upstream systemd-tpm2-generator? So the upstream generator assumes that if /dev/tpmrm0 already exists it doesn't need to bother with tpm2.target, and that the TPM device already works. But that's not really the case for you I guess, you have a TPM device node *before* it actually works, right? You need to start the tee service for it to start working, if I understood correctly. So I guess this is what happens: When the generator runs early in the initrd it sees that /dev/tpmrm0 is absent, it enqueues tpm2.target to wait for it, wich pulls in the tee agent, and all is good. After wards we do the initrd→host transition. When the generator then runs again, early in the host fs it sees that /dev/tpmrm0 already exists, and doesn't do anything. Hence all sync'ing is off and stuff will start using the tpm before it is usable. I guess to fix this we have to somehow ensure that after the transition we'll detect that the /dev/tpmrm0 device is not actually usable, and we have to enqueue tpm2.target after all. Is there any reasonable way we can detect this? For example, for this kind of TPM device is there maybe a sysfs attribute file in /sys/class/tpm/tpm0/ or so which tells is whether the device already works, or if it needs some userspace component? Note that at that point udev is not operable anymore/yet hence we cannot just ask the udev db for this. > tee-supplicant-initrd@.service: > > [Unit] > Description=TEE Supplicant on %i (initrd) > DefaultDependencies=no > After=dev-%i.device > Wants=dev-%i.device tpm2.target > Conflicts=shutdown.target tee-supplicant@teepriv0.service > Before=tpm2.target sysinit.target shutdown.target tee-supplicant@teepriv0.service initrd-switch-root.target > > [Service] > Type=simple This is the default type, you can drop this. That said, I am pretty sure this is actually not correct. Type=simple means that we consider the service ready the instant we fork()ed off the process for it. But that almost certainly means that the TPM device is not ready to use yet, because the TEE supplicant won't even have opened the device it operates on, and not have that set up. So I'd expect the TEE service would use sd_notify() to send a "READY=1" notification to the service manager once it did everything so that /dev/tpmrm0 is ready to go. You'd then use Type=notify or Type=notify-reload in the unit file to tell systemd that it shall wait for sd_notify(). > EnvironmentFile=-@sysconfdir@/default/tee-supplicant > ExecStart=@sbindir@/tee-supplicant $OPTARGS > > tee-supplicant@.service: > > [Unit] > Description=TEE Supplicant on %i > DefaultDependencies=no > After=dev-%i.device > Wants=dev-%i.device Same here, should pull in tpm2.target. > Conflicts=shutdown.target tee-supplicant-initrd@teepriv0.service > Before=systemd-pcrmachine.service systemd-tpm2-setup.service sysinit.target shutdown.target > After=tpm2.target initrd-switch-root.target > tee-supplicant-initrd@teepriv0.service These deps look incorrect, just use the same ones as up top. > > [Service] > Type=simple > EnvironmentFile=-@sysconfdir@/default/tee-supplicant > ExecStart=@sbindir@/tee-supplicant $OPTARGS i don't think the two service files need to differ between initrd and host fs. Just use the same service file. i.e. i don't see a reason for having two distinct unit files, just use the one you listed above as tee-supplicant-initrd@.service for both cases (and drop the -initrd suffix) > > Please provide proper boot logs, with debug logging enabled. > > Debug logging is available from here, sadly log is too big to view > nicely on the web page and has to be downloaded: > > https://ledge.validation.linaro.org/scheduler/job/88420 This indeed shows that tpm2.target doesn't get enqueued again after the initrd transition. So my educated guess above seems to be right, and we need to find a way now to automatically determine from a TPM device node whether it is ready to use or not. So far we assumed if we have one it was ready to use, but that appears to be incorrect for these TEE devices. So how do we detect this case so that we can delay TPM operations until the thing is working again via the tpm2.target stuff? Lennart -- Lennart Poettering, Berlin