Hello, there is a problem report that booting a specific type of system about 0.1% of the time encrypted volume (using a PCR to release the key) fails to unlock because of TPM operation timeout. Minimizing the test case failed so far. For example, booting into text mode as opposed to graphical desktop makes the problem unreproducible. The test is done with a frankenkernel that has TPM drivers about on par with Linux 6.4 but using actual Linux 6.4 the problem is not reproducible, either. However, given the problem takes up to a day to reproduce I do not have much confidence in the negative results. With some instrumentation it was determined that the problem happens here: /* * If interrupts are used (signaled by an irq set in the vendor structure) * tpm.c can skip polling for the data to be available as the interrupt is * waited for here */ static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len) { struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev); int rc, status, burstcnt; size_t count = 0; bool itpm = test_bit(TPM_TIS_ITPM_WORKAROUND, &priv->flags); status = tpm_tis_status(chip); if ((status & TPM_STS_COMMAND_READY) == 0) { tpm_tis_ready(chip); if (wait_for_tpm_stat (chip, TPM_STS_COMMAND_READY, chip->timeout_b, &priv->int_queue, false) < 0) { >>> rc = -ETIME; goto out_err; } } while (count < len - 1) { burstcnt = get_burstcount(chip); if (burstcnt < 0) { dev_err(&chip->dev, "Unable to read burstcount\n"); rc = burstcnt; goto out_err; } burstcnt = min_t(int, burstcnt, len - count - 1); rc = tpm_tis_write_bytes(priv, TPM_DATA_FIFO(priv->locality), burstcnt, buf + count); if (rc < 0) goto out_err; count += burstcnt; if (wait_for_tpm_stat(chip, TPM_STS_VALID, chip->timeout_c, &priv->int_queue, false) < 0) { rc = -ETIME; goto out_err; } status = tpm_tis_status(chip); if (!itpm && (status & TPM_STS_DATA_EXPECT) == 0) { rc = -EIO; goto out_err; } } /* write last byte */ rc = tpm_tis_write8(priv, TPM_DATA_FIFO(priv->locality), buf[count]); if (rc < 0) goto out_err; if (wait_for_tpm_stat(chip, TPM_STS_VALID, chip->timeout_c, &priv->int_queue, false) < 0) { rc = -ETIME; goto out_err; } status = tpm_tis_status(chip); if (!itpm && (status & TPM_STS_DATA_EXPECT) != 0) { rc = -EIO; goto out_err; } rc = tpm_tis_verify_crc(priv, len, buf); if (rc < 0) { dev_err(&chip->dev, "CRC mismatch for command.\n"); goto out_err; } return 0; out_err: tpm_tis_ready(chip); return rc; } localhost kernel: tpm tpm0: tpm_tis_send_data: 353: ready: Timed out (2236 of 2000 ms) localhost kernel: tpm tpm0: tpm_tis_send_data: 353: ready: Took (2236 of 2000 ms) 200ms is a very long time for kernel space yet minor changes to the test setup avoid the problem. I am wondering if this timeout is insufficient if eg. the previous operation can take up to TPM_LONG_LONG or perhaps if some of that time should have been consumed by the previous operation but because of insufficient locking somewhere the kernel got to waiting on the chip to become ready too early. Note that these two volume unlocks are done by systemd services, and happen in parallel as much as possible. The chip type: tpm_tis IFX1522:00: 2.0 TPM (device-id 0x1D, rev-id 54) Handle 0x0001, DMI type 43, 31 bytes TPM Device Vendor ID: IFX Specification Version: 2.0 Firmware Revision: 15.22 Description: SLB9672 Characteristics: Family configurable via firmware update Family configurable via platform software support OEM-specific Information: 0x00000000 Any idea what could be the cause of this problem? Thanks Michal