Re: TPM operation times out (very rarely)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed Jan 29, 2025 at 6:02 PM EET, Jonathan McDowell wrote:
> On Wed, Jan 29, 2025 at 04:27:15PM +0100, Michal Suchánek wrote:
> > there is a problem report that booting a specific type of system about
> > 0.1% of the time encrypted volume (using a PCR to release the key) fails
> > to unlock because of TPM operation timeout.
> > 
> > Minimizing the test case failed so far.
> > 
> > For example, booting into text mode as opposed to graphical desktop
> > makes the problem unreproducible.
> > 
> > The test is done with a frankenkernel that has TPM drivers about on par
> > with Linux 6.4 but using actual Linux 6.4 the problem is not
> > reproducible, either.
> > 
> > However, given the problem takes up to a day to reproduce I do not have
> > much confidence in the negative results.
>
> So. We see what look like similar timeouts in our fleet, but I haven't
> managed to produce a reliable test case that gives me any confidence
> about what the cause is.
>
> https://lore.kernel.org/linux-integrity/Zv1810ZfEBEhybmg@xxxxxxxx/
>
> for my previous post about this.

Ugh, this was my first week at new job, sorry.

2000 ms is like a spec value, which can be a bad idea. Please look at
Table 18.

My guess is that GUI makes more stuff happening in the system, which
could make latencies more shaky.

The most trivial candidate would be:

	status = tpm_tis_status(chip);
	if ((status & TPM_STS_COMMAND_READY) == 0) {
		tpm_tis_ready(chip);
		if (wait_for_tpm_stat
		    (chip, TPM_STS_COMMAND_READY, TPM_TIS_TIMEOUT_MAX /* e.g. 2250 ms */,
		     &priv->int_queue, false) < 0) {
		     	rc = -ETIME;
			goto out_err;
		}
	}

On the other hand, for me tpm_tis_send_main() looked initially weird:

	for (try = 0; try < TPM_RETRY; try++) {
		rc = tpm_tis_send_data(chip, buf, len);
		if (rc >= 0)
			/* Data transfer done successfully */
			break;
		else if (rc != -EIO)
			/* Data transfer failed, not recoverable */
			return rc;
	}

I.e. no retry on -ETIME.

But I'd fixup instead tpm_common_write():

out:
	mutex_unlock(&priv->buffer_mutex);

	if (ret == -ETIME)
		return -ERESTARTSYS;

	return ret;
}

It still can be interrupted by a signal this way. Retry loop would
block too much.

Not sure if only the increase in timeout value would be enough or
should the both sites be fixed up.

[1] https://trustedcomputinggroup.org/wp-content/uploads/PC-Client-Specific-Platform-TPM-Profile-for-TPM-2p0-v1p05p_r14_pub.pdf

BR, Jarkko





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux