Resend following email with plaintext ----- Hi Paul and Ken, Thanks for quick responses over this report. > Hao, I wouldn’t expect a longer timeout causing the TPM to be queried less frequently, but I do not know the code well. >From our understanding, the TPM queries might be retried due to some reason, increase timeout 3x would lower the query frequency to 1/3. The explanation might be wrong, but the fact looks like the timeout matters. Unfortunately, engineers from Rubrik are not experts over the TPM driver code neither :( >> Be careful about making this a global change. It could reduce the TPM performance by 3x. We don't want to affect all TPMs to fix a bug in an old TPM 1.2 chip from one vendor. > > Linux has a no regression policy, so the performance penalty wouldn’t matter. Unfortunately, the regression was only noticed several years after being introduced in Linux v4.14-rc2. So does that mean we are good to apply the global change ? Or we need to discuss about the actual fix further ? Thanks Hao > On Sep 12, 2020, at 12:37 AM, Paul Menzel <pmenzel@xxxxxxxxxxxxx> wrote: > > Dear Ken, > > > Any reason you stripped the CC list? Adding back Hao Wu and adding the patch author. > > Am 11.09.20 um 21:35 schrieb Ken Goldman: >> On 9/11/2020 12:18 AM, Greg KH wrote: >>> Thus the TPM get queried more frequently than before, which is likely the root cause of the atmel chip crash. We fix it by bumping up the TPM_TIMEOUT to 15ms. >>> >>> >>> Rubrik Patch >>> ``` >>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h >>> index 72d3ce4..9b8f3f8 100644 >>> --- a/drivers/char/tpm/tpm.h >>> +++ b/drivers/char/tpm/tpm.h >>> @@ -49,7 +49,15 @@ enum tpm_const { >>> }; >>> >>> enum tpm_timeout { >>> - TPM_TIMEOUT = 5, /* msecs */ >>> + TPM_TIMEOUT = 15, /* msecs */ >>> TPM_TIMEOUT_RETRY = 100, /* msecs */ >>> TPM_TIMEOUT_RANGE_US = 300 /* usecs */ >>> }; >>> ``` >>> With the patch, the atmel TPM chip crash is fixed. >>> >>> Proposal >>> We want the kernel upstream to adopt the fix or have a better fix for the atmel chip while not bring performance regression for other TPM chips. We understand that https://github.com/torvalds/linux/commit/9f3fc7bcddcb51234e23494531f93ab60475e1c3 >>> was intended to shorten the TPM >> Is this the poll time, which was reduced at one point? If so ... > > The commit summary is: tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers. > >> Be careful about making this a global change. It could reduce the TPM performance by 3x. We don't want to affect all TPMs to fix a bug in an old TPM 1.2 chip from one vendor. > > Linux has a no regression policy, so the performance penalty wouldn’t matter. Unfortunately, the regression was only noticed several years after being introduced in Linux v4.14-rc2. > > Hao, I wouldn’t expect a longer timeout causing the TPM to be queried less frequently, but I do not know the code well. > > > Kind regards, > > Paul