> On Jul 11, 2021, at 12:37 AM, Hao Wu <hao.wu@xxxxxxxxxx> wrote: > >> On Jul 9, 2021, at 12:23 PM, Hao Wu <hao.wu@xxxxxxxxxx> wrote: >> >>> On Jul 9, 2021, at 10:47 AM, Jarkko Sakkinen <jarkko@xxxxxxxxxx> wrote: >>> >>> On Thu, Jul 08, 2021 at 09:40:28PM -0700, Hao Wu wrote: >>>> The Atmel TPM 1.2 chips crash with error >>>> `tpm_try_transmit: send(): error -62` since kernel 4.14. >>>> It is observed from the kernel log after running `tpm_sealdata -z`. >>>> The error thrown from the command is as follows >>>> ``` >>>> $ tpm_sealdata -z >>>> Tspi_Key_LoadKey failed: 0x00001087 - layer=tddl, >>>> code=0087 (135), I/O error >>>> ``` >>>> >>>> The issue was reproduced with the following Atmel TPM chip: >>>> ``` >>>> $ tpm_version >>>> T0 TPM 1.2 Version Info: >>>> Chip Version: 1.2.66.1 >>>> Spec Level: 2 >>>> Errata Revision: 3 >>>> TPM Vendor ID: ATML >>>> TPM Version: 01010000 >>>> Manufacturer Info: 41544d4c >>>> ``` >>>> >>>> The root cause of the issue is due to the TPM calls to msleep() >>>> were replaced with usleep_range() [1], which reduces >>>> the actual timeout. Via experiments, it is observed that >>>> the original msleep(5) actually sleeps for 15ms. >>>> Because of a known timeout issue in Atmel TPM 1.2 chip, >>>> the shorter timeout than 15ms can cause the error described above. >>>> >>>> A few further changes in kernel 4.16 [2] and 4.18 [3, 4] further >>>> reduced the timeout to less than 1ms. With experiments, >>>> the problematic timeout in the latest kernel is the one >>>> for `wait_for_tpm_stat`. >>>> >>>> To fix it, the patch reverts the timeout of `wait_for_tpm_stat` >>>> to 15ms for all Atmel TPM 1.2 chips, but leave it untouched >>>> for Ateml TPM 2.0 chip, and chips from other vendors. >>>> As explained above, the chosen 15ms timeout is >>>> the actual timeout before this issue introduced, >>>> thus the old value is used here. >>>> Particularly, TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 14700us, >>>> TPM_ATML_TIMEOUT_WAIT_STAT_MIN is set to 15000us according to >>>> the existing TPM_TIMEOUT_RANGE_US (300us). >>>> The fixed has been tested in the system with the affected Atmel chip >>>> with no issues observed after boot up. >>>> >>>> References: >>>> [1] 9f3fc7bcddcb tpm: replace msleep() with usleep_range() in TPM >>>> 1.2/2.0 generic drivers >>>> [2] cf151a9a44d5 tpm: reduce tpm polling delay in tpm_tis_core >>>> [3] 59f5a6b07f64 tpm: reduce poll sleep time in tpm_transmit() >>>> [4] 424eaf910c32 tpm: reduce polling time to usecs for even finer >>>> granularity >>>> >>>> Fixes: 9f3fc7bcddcb ("tpm: replace msleep() with usleep_range() in TPM 1.2/2.0 generic drivers") >>>> Link: https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@xxxxxxxxxx/ >>>> Signed-off-by: Hao Wu <hao.wu@xxxxxxxxxx> >>>> --- >>>> This version (v2) has following changes on top of the last (v1): >>>> - follow the existing way to define two timeouts (min and max) >>>> for ATMEL chip, thus keep the exact timeout logic for >>>> non-ATEML chips. >>>> - limit the timeout increase to only ATMEL TPM 1.2 chips, >>>> because it is not an issue for TPM 2.0 chips yet. >>>> >>>> Test Plan: >>>> - Run fixed kernel with ATMEL TPM chips and see crash has been fixed. >>>> - Run fixed kernel with non-ATMEL TPM chips, and confirm >>>> the timeout has not been changed. >>>> >>>> drivers/char/tpm/tpm.h | 6 ++++-- >>>> drivers/char/tpm/tpm_tis_core.c | 23 +++++++++++++++++++++-- >>>> include/linux/tpm.h | 3 +++ >>>> 3 files changed, 28 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h >>>> index 283f78211c3a..6de1b44c4aab 100644 >>>> --- a/drivers/char/tpm/tpm.h >>>> +++ b/drivers/char/tpm/tpm.h >>>> @@ -41,8 +41,10 @@ enum tpm_timeout { >>>> TPM_TIMEOUT_RETRY = 100, /* msecs */ >>>> TPM_TIMEOUT_RANGE_US = 300, /* usecs */ >>>> TPM_TIMEOUT_POLL = 1, /* msecs */ >>>> - TPM_TIMEOUT_USECS_MIN = 100, /* usecs */ >>>> - TPM_TIMEOUT_USECS_MAX = 500 /* usecs */ >>>> + TPM_TIMEOUT_USECS_MIN = 100, /* usecs */ >>>> + TPM_TIMEOUT_USECS_MAX = 500, /* usecs */ >>>> + TPM_ATML_TIMEOUT_WAIT_STAT_MIN = 14700, /* usecs */ >>>> + TPM_ATML_TIMEOUT_WAIT_STAT_MAX = 15000 /* usecs */ >>>> }; >>>> >>>> /* TPM addresses */ >>>> diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c >>>> index 55b9d3965ae1..ae27d66fdd94 100644 >>>> --- a/drivers/char/tpm/tpm_tis_core.c >>>> +++ b/drivers/char/tpm/tpm_tis_core.c >>>> @@ -80,8 +80,17 @@ static int wait_for_tpm_stat(struct tpm_chip *chip, u8 mask, >>>> } >>>> } else { >>>> do { >>>> - usleep_range(TPM_TIMEOUT_USECS_MIN, >>>> - TPM_TIMEOUT_USECS_MAX); >>>> + /* this code path could be executed before >>>> + * timeouts initialized in chip instance. >>>> + */ >>>> + if (chip->timeout_wait_stat_min && >>>> + chip->timeout_wait_stat_max) >>>> + usleep_range(chip->timeout_wait_stat_min, >>>> + chip->timeout_wait_stat_max); >>>> + else >>>> + usleep_range(TPM_TIMEOUT_USECS_MIN, >>>> + TPM_TIMEOUT_USECS_MAX); >>> >>> This starts to look otherwise fine but you don't need this condition. >>> Just initialize variables to TPM_TIMEOUT_USECS_{MIN, MAX} for non-Atmel. >> Not sure I got your point or not. We have discussed this question a few rounds before, >> I answered you about this. This check is required because before the time of >> Initialization in the code I added in `tpm_tis_core_init` >> ``` >> + chip->timeout_wait_stat_min = TPM_TIMEOUT_USECS_MIN; >> + chip->timeout_wait_stat_max = TPM_TIMEOUT_USECS_MAX; >> ``` >> The func `wait_for_tpm_stat` runs, we need the condition to fall back to avoid system startup crash. >> >> Let me know if this makes sense. If needed, I can do another confirm. > I double checked this, and found the current init lines in `tpm_tis_core_init` > is actually before this code path now. Maybe it was an issue in one > of my old revision and I had the wrong impression. > The condition seems ok to remove in the current revision. > > But I am not fully sure is if the behavior is consistent across other 1.2 chips, and TPM 2.0 chips. > Should we still keep the condition for robustness or ship without it ? > This has been updated in a v3 patch https://patchwork.kernel.org/project/linux-integrity/patch/20210711075122.30056-1-hao.wu@xxxxxxxxxx/ Let me know if that is preferred. I tested in both atmel and non-atmel machine. Works fine so far. >>> /Jarkko >> >> Hao > > Hao Hao