> On May 9, 2021, at 7:17 PM, Mimi Zohar <zohar@xxxxxxxxxxxxx> wrote: > > [Cc'ing Jarkko Sakkinen <jarkko@xxxxxxxxxx>] > > On Sat, 2021-05-08 at 23:31 -0700, Hao Wu wrote: >>> On May 8, 2021, at 11:18 PM, Hao Wu <hao.wu@xxxxxxxxxx> wrote: >>> >>>> On Nov 18, 2020, at 1:11 PM, Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote: >>>> >>>> On Fri, Nov 13, 2020 at 08:39:28PM -0800, Hao Wu wrote: >>>>>> On Oct 17, 2020, at 10:20 PM, Hao Wu <hao.wu@xxxxxxxxxx> wrote: >>>>>> >>>>>>> On Oct 17, 2020, at 10:09 PM, Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote: >>>>>>> >>>>>>> On Fri, Oct 16, 2020 at 11:11:37PM -0700, Hao Wu wrote: >>>>>>>>> On Oct 1, 2020, at 4:04 PM, Jarkko Sakkinen <jarkko.sakkinen@xxxxxxxxxxxxxxx> wrote: >>>>>>>>> >>>>>>>>> On Thu, Oct 01, 2020 at 11:32:59AM -0700, James Bottomley wrote: >>>>>>>>>> On Thu, 2020-10-01 at 14:15 -0400, Nayna wrote: >>>>>>>>>>> On 10/1/20 12:53 AM, James Bottomley wrote: >>>>>>>>>>>> On Thu, 2020-10-01 at 04:50 +0300, Jarkko Sakkinen wrote: >>>>>>>>>>>>> On Wed, Sep 30, 2020 at 03:31:20PM -0700, James Bottomley wrote: >>>>>>>>>>>>>> On Thu, 2020-10-01 at 00:09 +0300, Jarkko Sakkinen wrote: >>>>>>>>>> [...] >>>>>>>>>>>>>>> I also wonder if we could adjust the frequency dynamically. >>>>>>>>>>>>>>> I.e. start with optimistic value and lower it until finding >>>>>>>>>>>>>>> the sweet spot. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The problem is the way this crashes: the TPM seems to be >>>>>>>>>>>>>> unrecoverable. If it were recoverable without a hard reset of >>>>>>>>>>>>>> the entire machine, we could certainly play around with it. I >>>>>>>>>>>>>> can try alternative mechanisms to see if anything's viable, but >>>>>>>>>>>>>> to all intents and purposes, it looks like my TPM simply stops >>>>>>>>>>>>>> responding to the TIS interface. >>>>>>>>>>>>> >>>>>>>>>>>>> A quickly scraped idea probably with some holes in it but I was >>>>>>>>>>>>> thinking something like >>>>>>>>>>>>> >>>>>>>>>>>>> 1. Initially set slow value for latency, this could be the >>>>>>>>>>>>> original 15 ms. >>>>>>>>>>>>> 2. Use this to read TPM_PT_VENDOR_STRING_*. >>>>>>>>>>>>> 3. Lookup based vendor string from a fixup table a latency that >>>>>>>>>>>>> works >>>>>>>>>>>>> (the fallback latency could be the existing latency). >>>>>>>>>>>> >>>>>>>>>>>> Well, yes, that was sort of what I was thinking of doing for the >>>>>>>>>>>> Atmel ... except I was thinking of using the TIS VID (16 byte >>>>>>>>>>>> assigned vendor ID) which means we can get the information to set >>>>>>>>>>>> the timeout before we have to do any TPM operations. >>>>>>>>>>> >>>>>>>>>>> I wonder if the timeout issue exists for all TPM commands for the >>>>>>>>>>> same manufacturer. For example, does the ATMEL TPM also crash when >>>>>>>>>>> extending PCRs ? >>>>>>>>>>> >>>>>>>>>>> In addition to defining a per TPM vendor based lookup table for >>>>>>>>>>> timeout, would it be a good idea to also define a Kconfig/boot param >>>>>>>>>>> option to allow timeout setting. This will enable to set the timeout >>>>>>>>>>> based on the specific use. >>>>>>>>>> >>>>>>>>>> I don't think we need go that far (yet). The timing change has been in >>>>>>>>>> upstream since: >>>>>>>>>> >>>>>>>>>> commit 424eaf910c329ab06ad03a527ef45dcf6a328f00 >>>>>>>>>> Author: Nayna Jain <nayna@xxxxxxxxxxxxxxxxxx> >>>>>>>>>> Date: Wed May 16 01:51:25 2018 -0400 >>>>>>>>>> >>>>>>>>>> tpm: reduce polling time to usecs for even finer granularity >>>>>>>>>> >>>>>>>>>> Which was in the released kernel 4.18: over two years ago. In all that >>>>>>>>>> time we've discovered two problems: mine which looks to be an artifact >>>>>>>>>> of an experimental upgrade process in a new nuvoton and the Atmel. >>>>>>>>>> That means pretty much every other TPM simply works with the existing >>>>>>>>>> timings >>>>>>>>>> >>>>>>>>>>> I was also thinking how will we decide the lookup table values for >>>>>>>>>>> each vendor ? >>>>>>>>>> >>>>>>>>>> I wasn't thinking we would. I was thinking I'd do a simple exception >>>>>>>>>> for the Atmel and nothing else. I don't think my Nuvoton is in any way >>>>>>>>>> characteristic. Indeed my pluggable TPM rainbow bridge system works >>>>>>>>>> just fine with a Nuvoton and the current timings. >>>>>>>>>> >>>>>>>>>> We can add additional exceptions if they actually turn up. >>>>>>>>> >>>>>>>>> I'd add a table and fallback. >>>>>>>>> >>>>>>>> >>>>>>>> Hi folks, >>>>>>>> >>>>>>>> I want to follow up this a bit and check whether we reached a consensus >>>>>>>> on how to fix the timeout issue for Atmel chip. >>>>>>>> >>>>>>>> Should we revert the changes or introduce the lookup table for chips. >>>>>>>> >>>>>>>> Is there anything I can help from Rubrik side. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Hao >>>>>>> >>>>>>> There is nothing to revert as the previous was not applied but I'm >>>>>>> of course ready to review any new attempts. >>>>>>> >>>>>> >>>>>> Hi Jarkko, >>>>>> >>>>>> By “revert” I meant we revert the timeout value changes by applying >>>>>> the patch I proposed, as the timeout value discussed does cause issues. >>>>>> >>>>>> Why don’t we apply the patch and improve the perf in the way of not >>>>>> breaking TPMs ? >>>>>> >>>>>> Hao >>>>> >>>>> Hi Jarkko and folks, >>>>> >>>>> It’s being a while since our last discussion. I want to push a fix in the upstream for ateml chip. >>>>> It looks like we currently have following choices: >>>>> 1. generic fix for all vendors: have a lookup table for sleep time of wait_for_tpm_stat >>>>> (i.e. TPM_TIMEOUT_WAIT_STAT in my proposed patch) >>>>> 2. quick fix for the regression: change the sleep time of wait_for_tpm_stat back to 15ms. >>>>> It is the current proposed patch >>>>> 3. Fix regression by making exception for ateml chip. >>>>> >>>>> Should we reach consensus on which one we want to pursue before dig >>>>> into implementation of the patch? In my opinion, I prefer to fix the >>>>> regression with 2, and then pursue 1 as long-term solution. 3 is >>>>> hacky. >>>> >>>> What does option 1 fix for *all* vendors? >>>> >>>>> Let me know what do you guys think >>>>> >>>>> Hao >>>> >>>> /Jarkko >>> >>> Hi Jarkko and folks, >>> >>> It has been a while again. In my previous message I answered Jarkko’s question about the option 1. >>> Jarkko, let me know if it is clear to you or you have further questions and suggestions on next to do. >>> Somehow I couldn’t found the last message I sent but it is in >>> https://patchwork.kernel.org/project/linux-integrity/patch/20200926223150.109645-1-hao.wu@xxxxxxxxxx/ >>> >>> In high-level, the option 1 is to add a timing lookup table for each manufacture, hence we can >>> configure timing for each chip respectively. Then we don’t need to worry about fixing ATMEL >>> timing may cause performance degradation for other chips. >>> >>> I do want to push the fix in TPM driver, which is likely to be hit going forward again when people are doing >>> refactoring without testing chips from all manufacturing. >>> >>> Let me know how should I push this forward. >>> >>> Thanks >>> Hao >>> >> It looks like Jarkko’s email address (jarkko.sakkinen@xxxxxxxxxxxxxxx) is unreachable now, >> can other TPM maintainer / reviewer help make a call and unblock this ? > > A while ago Jarkko asked everyone to use his kernel.org address. > > Mimi Ah thanks Mimi, just found Jarkko’s address. Jarkko please check the message above when you have a chance. Hao