Re: [PATCH] Fix Atmel TPM crash caused by too frequent queries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jarkko,

https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 
Is the one introducing the issue since 4.14. Then the other three commits
changed the relevant code a bit. Probably you can check the timestamp / release version
on each commit to understand the relationship.

I think the original patch commit message can help you understand the root cause.
Attaching the commit here for your convenience.

Thanks
Hao

-----

Since kernel 4.14, we fixed the TPM sleep logic
from msleep to usleep_range, so that the TPM
sleeps exactly with TPM_TIMEOUT (=5ms) afterward.
Before that fix, msleep(5) actually sleeps for
around 15ms.
The fix is https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3

That fix uncovered that the TPM_TIMEOUT was not properly
set previously. We recently found the TPM driver in kernel 4.14+
(including 5.9-rc4) crashes Atmel TPM chips with
too frequent TPM queries.

The TPM crash signature is
```
$ tpm_sealdata -z
Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, code=0087 (135), I/O error

$ sudo dmesg | grep tpm0
[59154.665549] tpm tpm0: tpm_try_transmit: send(): error -62
[59154.809532] tpm tpm0: tpm_try_transmit: send(): error -62
```

>From the error code "-62", it looks similar to another bug
https://patchwork.kernel.org/patch/10520247/
where the "TPM_TIMEOUT_USECS_MAX" and "TPM_TIMEOUT_USEC_MIN"
is too small, which causes TPM get queried too frequently,
and thus crashes.

We finally found the root cause is related to
the sleep timeout used in `wait_for_tpm_stat`

In the 4.16, commit
https://github.com/torvalds/linux/commit/cf151a9a44d52a63332e8e926234574fe5a5d784
uses `TPM_POLL_SLEEP` instead of TPM_TIMEOUT for `wait_for_tpm_stat`
and set `TPM_POLL_SLEEP` (1ms).

Since 4.18 commits
https://github.com/torvalds/linux/commit/59f5a6b07f6434efac0057dc2f303a96b871811b
https://github.com/torvalds/linux/commit/424eaf910c329ab06ad03a527ef45dcf6a328f00
further reduced the timeout in wait_for_tpm_stat to less than 1ms.

Our patch here defines a `TPM_TIMEOUT_WAIT_STAT` (15ms) just for
`wait_for_tpm_stat the` to fix the crash in Atmel chips,
but not introduce unnecessary performance regression
in other workflows.
15ms is the actual timeout TPM chips use before the 4.14 fix
from msleep to usleep_range. Thus fixed the crash.

Exploring smaller `TPM_TIMEOUT_WAIT_STAT` should be a separate
thing to revisit later. This patch meant to fix the regression
introduced since 4.14

Test Plan:
- Run fixed kernel on system with Atmel TPM chips
 and ensure crash does not happen
- Run fixed kernel on system with other TPM chips
 (IFX / WEC / STM) ensure not breakages from tpm-tool

Signed-off-by: Hao Wu <hao.wu@xxxxxxxxxx>
---
drivers/char/tpm/tpm.h          | 1 +
drivers/char/tpm/tpm_tis_core.c | 3 +--
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 947d1db0a5cc..899097ae9756 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -41,6 +41,7 @@ enum tpm_timeout {
	TPM_TIMEOUT_RETRY = 100, /* msecs */
	TPM_TIMEOUT_RANGE_US = 300,	/* usecs */
	TPM_TIMEOUT_POLL = 1,	/* msecs */
+	TPM_TIMEOUT_WAIT_STAT = 15,      /* msecs */
	TPM_TIMEOUT_USECS_MIN = 100,      /* usecs */
	TPM_TIMEOUT_USECS_MAX = 500      /* usecs */
};
diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 65ab1b027949..8aa5eef10c28 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -80,8 +80,7 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask,
		}
	} else {
		do {
-			usleep_range(TPM_TIMEOUT_USECS_MIN,
-				     TPM_TIMEOUT_USECS_MAX);
+			tpm_msleep(TPM_TIMEOUT_WAIT_STAT);
			status = chip->ops->status(chip);
			if ((status & mask) == mask)
				return 0;
-- 
2.17.1



> On Sep 28, 2020, at 7:16 AM, Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote:
> 
> On Sun, Sep 27, 2020 at 11:03:47PM -0700, Hao Wu wrote:
>> Hi Jarkko,
>> 
>> Just to be clear it is not caused by that single commit, but a few accumulated commits
>> https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3
>> https://github.com/torvalds/linux/commit/cf151a9a44d52a63332e8e926234574fe5a5d784
>> https://github.com/torvalds/linux/commit/59f5a6b07f6434efac0057dc2f303a96b871811b
>> https://github.com/torvalds/linux/commit/424eaf910c329ab06ad03a527ef45dcf6a328f00
>> 
>> The easy way is probably just apply the patch I provided, and then revisit the value
>> for TPM_TIMEOUT_WAIT_STAT  
> 
> When you response, please quote properly, and do not top post.  The
> discussion is impossible to follow this way.
> 
> I'm not sure if I buy that. Which one is the first failing commit?
> 
> /Jarkko





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux