Re: TPM operation times out (very rarely)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 29, 2025 at 04:27:15PM +0100, Michal Suchánek wrote:
> there is a problem report that booting a specific type of system about
> 0.1% of the time encrypted volume (using a PCR to release the key) fails
> to unlock because of TPM operation timeout.
> 
> Minimizing the test case failed so far.
> 
> For example, booting into text mode as opposed to graphical desktop
> makes the problem unreproducible.
> 
> The test is done with a frankenkernel that has TPM drivers about on par
> with Linux 6.4 but using actual Linux 6.4 the problem is not
> reproducible, either.
> 
> However, given the problem takes up to a day to reproduce I do not have
> much confidence in the negative results.

So. We see what look like similar timeouts in our fleet, but I haven't
managed to produce a reliable test case that gives me any confidence
about what the cause is.

https://lore.kernel.org/linux-integrity/Zv1810ZfEBEhybmg@xxxxxxxx/

for my previous post about this.

> With some instrumentation it was determined that the problem happens
> here:

> static int tpm_tis_send_data(struct tpm_chip *chip, const u8 *buf, size_t len)
> {
> 	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
> 	int rc, status, burstcnt;
> 	size_t count = 0;
> 	bool itpm = test_bit(TPM_TIS_ITPM_WORKAROUND, &priv->flags);
> 
> 	status = tpm_tis_status(chip);
> 	if ((status & TPM_STS_COMMAND_READY) == 0) {
> 		tpm_tis_ready(chip);
> 		if (wait_for_tpm_stat
> 		    (chip, TPM_STS_COMMAND_READY, chip->timeout_b,
> 		     &priv->int_queue, false) < 0) {
> >>>			rc = -ETIME;
> 			goto out_err;
> 		}
> 	}

> localhost kernel: tpm tpm0: tpm_tis_send_data: 353: ready: Timed out (2236 of 2000 ms)
> localhost kernel: tpm tpm0: tpm_tis_send_data: 353: ready: Took (2236 of 2000 ms)

Can you track down the actual command that's taking the time? Though I
guess that's the previous command rather than the one that hits the
timeout.

> 200ms is a very long time for kernel space yet minor changes to the test
> setup avoid the problem.
> 
> I am wondering if this timeout is insufficient if eg. the previous
> operation can take up to TPM_LONG_LONG or perhaps if some of that time
> should have been consumed by the previous operation but because of
> insufficient locking somewhere the kernel got to waiting on the chip to
> become ready too early.
> 
> Note that these two volume unlocks are done by systemd services, and
> happen in parallel as much as possible.
> 
> The chip type:
> 
> tpm_tis IFX1522:00: 2.0 TPM (device-id 0x1D, rev-id 54)
> 
> Handle 0x0001, DMI type 43, 31 bytes
> TPM Device
>         Vendor ID: IFX
>         Specification Version: 2.0
>         Firmware Revision: 15.22
>         Description: SLB9672
>         Characteristics:
>                 Family configurable via firmware update
>                 Family configurable via platform software support
>         OEM-specific Information: 0x00000000
> 
> Any idea what could be the cause of this problem?

We're using SLB9670 devices, so older than yours, but at least the same
family.

J.

-- 
        The end is nearer.         |  .''`.  Debian GNU/Linux Developer
                                   | : :' :  Happy to accept PGP signed
                                   | `. `'   or encrypted mail - RSA
                                   |   `-    key on the keyservers.




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux