On Wed, 2022-05-18 at 15:26 -0400, Nayna wrote: > On 5/16/22 13:57, Jarkko Sakkinen wrote: > > On Thu, May 12, 2022 at 08:32:55AM -0400, James Bottomley wrote: > > > On Thu, 2022-05-12 at 08:21 -0400, Mimi Zohar wrote: [...] > > > > This patch reverts commit 5ef924d9e2e8 ("tpm: use tpm_msleep() > > > > value as max delay"). Are you experiencing TPM issues that > > > > require it? > > > I am: > > > > > > https://lore.kernel.org/linux-integrity/1531328689.3260.8.camel@xxxxxxxxxxxxxxxxxxxxx/ > > > > > > I'm about 24h into a soak test of the patch with no TPM failure > > > so far. I think it probably needs to run another 24h just to be > > > sure, but it does seem the theory is sound (my TPM gets annoyed > > > by being poked too soon) so reverting 5ef924d9e2e8 looks to be > > > the correct action. The only other ways I've found to fix this > > > are either revert the usleep_range patch altogether or increase > > > the timings: > > > > > > https://lore.kernel.org/linux-integrity/1531329074.3260.9.camel@xxxxxxxxxxxxxxxxxxxxx/ > > > > > > Which obviously pushes the min past whatever issue my TPM is > > > having even with 5ef924d9e2e8 applied. > > > > > > Given that even the commit message for 5ef924d9e2e8 admits it > > > only shaves about 12% off the TPM response time, that would > > > appear to be an optimization too far if it's going to cause some > > > TPMs to fail. > > > > > > James > > What if TPM started with the timings as they are now and use the > > "reverted" timings if coming up too early? The question here is > > though, is such complexity worth of anything or should we just > > revert and do nothing else. > > TCG Specification(TCG PC Client Device Driver Design Principles, > Section 10), says - General control timeouts, denoted as TIMEOUT_A > (A), TIMEOUT_B (B), TIMEOUT_C (C) and TIMEOUT_D (D), are the maximum > waiting time from a certain control operation from the DD until the > TPM shows the expected status change. Actually, this is nothing to do with the TIMEOUTS_A-D: those are the maximum times before a command should complete. This is the minimum time we should wait between pokes of the TPM to see if it is ready. Usually the use case is: while (read device status gives not ready) tpm_msleep(something) The tpm_msleep gives up CPU control (to prevent huge amounts of busy waiting) but before 424eaf910c32 ("tpm: reduce polling time to usecs for even finer granularity") we would sleep for an entire tick (time taken to make the process runnable) before the next poll, and since most TPM commands don't return immediately, that was a gate on how fast you could do simple TPM operations (like PCR extend). As far as I know, no TCG spec gives any details of the minimum wait time between poll cycles, so this is really something the manufacturer has to tell us. Just for completeness, my soak test did run to completion, but my TPM has since failed and dropped off the bus, so simply reverting this patch (5ef924d9e2e8) isn't sufficient to fully fix my problem. James