On Wed, Jan 29, 2025 at 04:27:15PM +0100, Michal Suchánek wrote: > Hello, > > there is a problem report that booting a specific type of system about > 0.1% of the time encrypted volume (using a PCR to release the key) fails > to unlock because of TPM operation timeout. > > Minimizing the test case failed so far. > > For example, booting into text mode as opposed to graphical desktop > makes the problem unreproducible. > > The test is done with a frankenkernel that has TPM drivers about on par > with Linux 6.4 but using actual Linux 6.4 the problem is not > reproducible, either. > > However, given the problem takes up to a day to reproduce I do not have > much confidence in the negative results. Michal, can you possibly try the below and see if it helps out? There seems to be a timing bug introduced in 6.4+ that I think might be related, and matches up with some of our internal metrics that showed an increase in timeouts in 6.4 onwards. commit 79041fba797d0fe907e227012767f56dd93fac32 Author: Jonathan McDowell <noodles@xxxxxxxx> Date: Wed Feb 19 16:20:44 2025 -0600 tpm, tpm_tis: Fix timeout handling when waiting for TPM status The change to only use interrupts to handle supported status changes, then switch to polling for the rest, inverted the status test and sleep such that we can end up sleeping beyond our timeout and not actually checking the status. This can result in spurious TPM timeouts, especially on a more loaded system. Fix by switching the order back so we sleep *then* check. We've done a up front check when we enter the function so this won't cause an additional delay when the status is already what we're looking for. Cc: stable@xxxxxxxxxxxxxxx # v6.4+ Fixes: e87fcf0dc2b4 ("tpm, tpm_tis: Only handle supported interrupts") Signed-off-by: Jonathan McDowell <noodles@xxxxxxxx> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c index fdef214b9f6b..167d71747666 100644 --- a/drivers/char/tpm/tpm_tis_core.c +++ b/drivers/char/tpm/tpm_tis_core.c @@ -114,11 +114,11 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask, return 0; /* process status changes without irq support */ do { + usleep_range(priv->timeout_min, + priv->timeout_max); status = chip->ops->status(chip); if ((status & mask) == mask) return 0; - usleep_range(priv->timeout_min, - priv->timeout_max); } while (time_before(jiffies, stop)); return -ETIME; }