Re: TPM operation times out (very rarely)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 31, 2025 at 12:25:21PM +0200, Jarkko Sakkinen wrote:
> On Fri Jan 31, 2025 at 10:35 AM EET, Michal Suchánek wrote:
> > Hello,
> >
> > On Fri, Jan 31, 2025 at 01:31:01AM +0200, Jarkko Sakkinen wrote:
> > > On Wed Jan 29, 2025 at 6:02 PM EET, Jonathan McDowell wrote:
> > > > On Wed, Jan 29, 2025 at 04:27:15PM +0100, Michal Suchánek wrote:
> > > > > there is a problem report that booting a specific type of system about
> > > > > 0.1% of the time encrypted volume (using a PCR to release the key) fails
> > > > > to unlock because of TPM operation timeout.
> > > > > 
> > > > > Minimizing the test case failed so far.
> > > > > 
> > > > > For example, booting into text mode as opposed to graphical desktop
> > > > > makes the problem unreproducible.
> > > > > 
> > > > > The test is done with a frankenkernel that has TPM drivers about on par
> > > > > with Linux 6.4 but using actual Linux 6.4 the problem is not
> > > > > reproducible, either.
> > > > > 
> > > > > However, given the problem takes up to a day to reproduce I do not have
> > > > > much confidence in the negative results.
> > > >
> > > > So. We see what look like similar timeouts in our fleet, but I haven't
> > > > managed to produce a reliable test case that gives me any confidence
> > > > about what the cause is.
> > > >
> > > > https://lore.kernel.org/linux-integrity/Zv1810ZfEBEhybmg@xxxxxxxx/
> > > >
> > > > for my previous post about this.
> > > 
> > > Ugh, this was my first week at new job, sorry.
> > > 
> > > 2000 ms is like a spec value, which can be a bad idea. Please look at
> > > Table 18.
> > > 
> > > My guess is that GUI makes more stuff happening in the system, which
> > > could make latencies more shaky.
> > > 
> > > The most trivial candidate would be:
> > > 
> > > 	status = tpm_tis_status(chip);
> > > 	if ((status & TPM_STS_COMMAND_READY) == 0) {
> > > 		tpm_tis_ready(chip);
> > > 		if (wait_for_tpm_stat
> > > 		    (chip, TPM_STS_COMMAND_READY, TPM_TIS_TIMEOUT_MAX /* e.g. 2250 ms */,
> >
> > 2250 is more than the measured 2226 but I have no idea if that's random
> > or in some way deterministic.
> 
> Your text vs GUI at least gives evidence of stochasticity while not a
> full-fledged proof. You can expect e.g. more IRQs happening when you
> run a GUI. I did not engineer that number. You could e.g. double the
> original number. The whole framework for timeout_b is ridiculous (if
> it is because of me it does not change that fact).

It looks like the timeout_b is used exclusively as the ready timeout *),
with various sources of the value depending on chip type.

Then increasing it should not cause any problem other than the kernel
waiting longer when the TPM chip is really stuck.

* There is one instance of use of timeout_b for TPM_STS_VALID in
st33zp24_pm_resume.

> Or perhaps we could consider even  wait_event_interruptible() inside
> wait_for_tpm_stat(), since it is interruptible.

It seems to be already interruptible, at least the implementation in
tpm_tis_core. There is another one in xenfront, and a few more
wait_for_stat() without _tpm_ in the middle.

Thanks

Michal




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux