On Thu, 2018-02-01 at 10:40 -0700, Jason Gunthorpe wrote: > On Thu, Feb 01, 2018 at 03:24:08PM +0000, James Bottomley wrote: > > > > > OK, I investigated but now my TPM has returned to normal (as in it > > passes the selftest immediately). There's clearly something that > > makes it return TPM_RC_TESTING to every self test probe for seconds > > at a time, but I don't know what it is. Sending a different > > command seems to cause the problem to clear (Managed to reproduce > > once with the patch to verify), so this is my proposed fix. It's > > clearly nonsensical to detach the driver because the self test > > still returns TPM_RC_TESTING, so convert that return to a > > TPM_RC_SUCCESS on timeout. It prints a warning message so we'll > > see it in the logs if it causes problems. Given that this seems to > > be some type of internal TPM issue, I don't believe changing the > > timings would work. > > But if a selftest returns TPM2_RC_TESTING I would expect the next > command to also fail with a testing in progress code? At least by the > spec.. No, the spec says only that the command may fail with TPM_RC_TESTING *if* the system it requires is under test. I have no idea what's up with my TPM. The selftests usually complete in under 1ms and return TPM_RC_SUCCESS. However occasionally the self test can return TPM_RC_TESTING for seconds. I've already booted with the patch and verified it doesn't induce any failures for me. Either the long running test is in a completely unconnected subsystem or invoking a different command forces the TPM to clear the testing state. > The point of invoking selftest is to get to a state where future TPM > commands will succeed, so returning immediately on RC_TESTING seems > wrong? Well it's definitely less wrong than the converse of actually detaching the TPM driver because a self test is apparently still in progress. If you depend on the TPM for a critical function, your entire system is hosed at that point. I honestly don't think we should be waiting for the self test at all. We should kick it off and treat any TPM_RC_TESTING error as -EAGAIN. We're already under fire for slow boot sequences and adding 2s just to wait for the TPM to self test adds to that for no real value. James