On Wed, Jan 29, 2025 at 04:27:15PM +0100, Michal Suchánek wrote: > there is a problem report that booting a specific type of system about > 0.1% of the time encrypted volume (using a PCR to release the key) fails > to unlock because of TPM operation timeout. > > Minimizing the test case failed so far. > > For example, booting into text mode as opposed to graphical desktop > makes the problem unreproducible. > > The test is done with a frankenkernel that has TPM drivers about on par > with Linux 6.4 but using actual Linux 6.4 the problem is not > reproducible, either. > > However, given the problem takes up to a day to reproduce I do not have > much confidence in the negative results. So. We see what look like similar timeouts in our fleet, but I haven't managed to produce a reliable test case that gives me any confidence about what the cause is. https://lore.kernel.org/linux-integrity/Zv1810ZfEBEhybmg@xxxxxxxx/ for my previous post about this. > With some instrumentation it was determined that the problem happens > here: > static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) > { > struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); > int rc, status, burstcnt; > size_t count = 0; > bool itpm = test_bit(TPM_TIS_ITPM_WORKAROUND, &priv->flags); > > status = tpm_tis_status(chip); > if ((status & TPM_STS_COMMAND_READY) == 0) { > tpm_tis_ready(chip); > if (wait_for_tpm_stat > (chip, TPM_STS_COMMAND_READY, chip->timeout_b, > &priv->int_queue, false) < 0) { > >>> rc = -ETIME; > goto out_err; > } > } > localhost kernel: tpm tpm0: tpm_tis_send_data: 353: ready: Timed out (2236 of 2000 ms) > localhost kernel: tpm tpm0: tpm_tis_send_data: 353: ready: Took (2236 of 2000 ms) Can you track down the actual command that's taking the time? Though I guess that's the previous command rather than the one that hits the timeout. > 200ms is a very long time for kernel space yet minor changes to the test > setup avoid the problem. > > I am wondering if this timeout is insufficient if eg. the previous > operation can take up to TPM_LONG_LONG or perhaps if some of that time > should have been consumed by the previous operation but because of > insufficient locking somewhere the kernel got to waiting on the chip to > become ready too early. > > Note that these two volume unlocks are done by systemd services, and > happen in parallel as much as possible. > > The chip type: > > tpm_tis IFX1522:00: 2.0 TPM (device-id 0x1D, rev-id 54) > > Handle 0x0001, DMI type 43, 31 bytes > TPM Device > Vendor ID: IFX > Specification Version: 2.0 > Firmware Revision: 15.22 > Description: SLB9672 > Characteristics: > Family configurable via firmware update > Family configurable via platform software support > OEM-specific Information: 0x00000000 > > Any idea what could be the cause of this problem? We're using SLB9670 devices, so older than yours, but at least the same family. J. -- The end is nearer. | .''`. Debian GNU/Linux Developer | : :' : Happy to accept PGP signed | `. `' or encrypted mail - RSA | `- key on the keyservers.