Re: [PATCH] tpm: Make timeout logic simpler and more robust

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2019-03-12 at 20:08 +0000, Calvin Owens wrote:
> On Tuesday 03/12 at 13:04 -0400, Mimi Zohar wrote:
> > On Mon, 2019-03-11 at 16:54 -0700, Calvin Owens wrote:
> > > We're having lots of problems with TPM commands timing out, and we're
> > > seeing these problems across lots of different hardware (both v1/v2).
> > > 
> > > I instrumented the driver to collect latency data, but I wasn't able to
> > > find any specific timeout to fix: it seems like many of them are too
> > > aggressive. So I tried replacing all the timeout logic with a single
> > > universal long timeout, and found that makes our TPMs 100% reliable.
> > > 
> > > Given that this timeout logic is very complex, problematic, and appears
> > > to serve no real purpose, I propose simply deleting all of it.
> > 
> > Normally before sending such a massive change like this, included in
> > the bug report or patch description, there would be some indication as
> > to which kernel introduced a regression.  Has this always been a
> > problem? Is this something new? How new?
> 
> Honestly we've always had problems with flakiness from these devices,
> but it seems to have regressed sometime between 4.11 and 4.16.

Well, that's a start.  Around 4.10 is when we started noticing TPM
performance issues due to the change in the kernel timer scheduling.
 This resulted in commit a233a0289cf9 ("tpm: msleep() delays - replace
with usleep_range() in i2c nuvoton driver"), which was upstreamed in
4.12.

At the other end, James was referring to commit "424eaf910c32 tpm:
reduce polling time to usecs for even finer granularity", which was
introduced in 4.18.

> 
> I wish a had a better answer for you: we need on the order of a hundred
> machines to see the difference, and setting up these 100+ machine tests
> is unfortunately involved enough that e.g. bisecting it just isn't
> feasible :/

> What I can say for sure is that this patch makes everything much better
> for us. If there's anything in particular you'd like me to test, I have
> an army of machines I'm happy to put to use, let me know :)

I would assume not all of your machines are the same nor have the same
TPM.  Could you verify that this problem is across the board, not
limited to a particular TPM.

BTW, are you seeing this problem with both TPM 1.2 or 2.0?

thanks!

Mimi




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux