On Wed, Apr 15, 2020 at 04:51:39PM -0700, James Bottomley wrote: > On Wed, 2020-04-15 at 15:45 -0700, Omar Sandoval wrote: > > From: Omar Sandoval <osandov@xxxxxx> > > > > We've encountered a particular model of STMicroelectronics TPM that > > transiently returns a bad value in the status register. This causes > > the kernel to believe that the TPM is ready to receive a command when > > it actually isn't, which in turn causes the send to time out in > > get_burstcount(). In testing, reading the status register one extra > > time convinces the TPM to return a valid value. > > Interesting, I've got a very early upgradeable nuvoton that seems to be > behaving like this. I'll attach the userspace reproducer I used to figure this out. I'd be interested to see if it times out on your TPM, too. Note that it bangs on /dev/mem and assumes that the MMIO address is 0xfed40000. That seems to be the hard-coded address for x86 in the kernel, but just to be safe you might want to check `grep MSFT0101 /proc/iomem`. > > Signed-off-by: Omar Sandoval <osandov@xxxxxx> > > --- > > drivers/char/tpm/tpm_tis_core.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/drivers/char/tpm/tpm_tis_core.c > > b/drivers/char/tpm/tpm_tis_core.c > > index 27c6ca031e23..277a21027fc7 100644 > > --- a/drivers/char/tpm/tpm_tis_core.c > > +++ b/drivers/char/tpm/tpm_tis_core.c > > @@ -238,6 +238,18 @@ static u8 tpm_tis_status(struct tpm_chip *chip) > > rc = tpm_tis_read8(priv, TPM_STS(priv->locality), &status); > > if (rc < 0) > > return 0; > > + /* > > + * Some STMicroelectronics TPMs have a bug where the status > > register is > > + * sometimes bogus (all 1s) if read immediately after the > > access > > + * register is written to. Bits 0, 1, and 5 are always > > supposed to read > > + * as 0, so this is clearly invalid. Reading the register a > > second time > > + * returns a valid value. > > + */ > > + if (unlikely(status == 0xff)) { > > + rc = tpm_tis_read8(priv, TPM_STS(priv->locality), > > &status); > > + if (rc < 0) > > + return 0; > > + } > > You theorize that your case is fixed by the second read, but what if it > isn't and the second read also returns 0xff? Shouldn't we have a line > here saying > > if (unlikely(status == 0xff)) > status = 0; > > So if we get a second 0xff we just pretend the thing isn't ready? We've been running this workaround in production for awhile and the hangs haven't happened since, and my userspace reproducer never witnessed a second 0xff. But it wouldn't hurt, so I can add it anyways.