Re: TPM operation times out (very rarely)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri Jan 31, 2025 at 10:35 AM EET, Michal Suchánek wrote:
> Hello,
>
> On Fri, Jan 31, 2025 at 01:31:01AM +0200, Jarkko Sakkinen wrote:
> > On Wed Jan 29, 2025 at 6:02 PM EET, Jonathan McDowell wrote:
> > > On Wed, Jan 29, 2025 at 04:27:15PM +0100, Michal Suchánek wrote:
> > > > there is a problem report that booting a specific type of system about
> > > > 0.1% of the time encrypted volume (using a PCR to release the key) fails
> > > > to unlock because of TPM operation timeout.
> > > > 
> > > > Minimizing the test case failed so far.
> > > > 
> > > > For example, booting into text mode as opposed to graphical desktop
> > > > makes the problem unreproducible.
> > > > 
> > > > The test is done with a frankenkernel that has TPM drivers about on par
> > > > with Linux 6.4 but using actual Linux 6.4 the problem is not
> > > > reproducible, either.
> > > > 
> > > > However, given the problem takes up to a day to reproduce I do not have
> > > > much confidence in the negative results.
> > >
> > > So. We see what look like similar timeouts in our fleet, but I haven't
> > > managed to produce a reliable test case that gives me any confidence
> > > about what the cause is.
> > >
> > > https://lore.kernel.org/linux-integrity/Zv1810ZfEBEhybmg@xxxxxxxx/
> > >
> > > for my previous post about this.
> > 
> > Ugh, this was my first week at new job, sorry.
> > 
> > 2000 ms is like a spec value, which can be a bad idea. Please look at
> > Table 18.
> > 
> > My guess is that GUI makes more stuff happening in the system, which
> > could make latencies more shaky.
> > 
> > The most trivial candidate would be:
> > 
> > 	status = tpm_tis_status(chip);
> > 	if ((status & TPM_STS_COMMAND_READY) == 0) {
> > 		tpm_tis_ready(chip);
> > 		if (wait_for_tpm_stat
> > 		    (chip, TPM_STS_COMMAND_READY, TPM_TIS_TIMEOUT_MAX /* e.g. 2250 ms */,
>
> 2250 is more than the measured 2226 but I have no idea if that's random
> or in some way deterministic.

Your text vs GUI at least gives evidence of stochasticity while not a
full-fledged proof. You can expect e.g. more IRQs happening when you
run a GUI. I did not engineer that number. You could e.g. double the
original number. The whole framework for timeout_b is ridiculous (if
it is because of me it does not change that fact).

Or perhaps we could consider even  wait_event_interruptible() inside
wait_for_tpm_stat(), since it is interruptible.

BR, Jarkko





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux