On Wed, Mar 13, 2019 at 03:22:32PM +0200, Jarkko Sakkinen wrote: > On Tue, Mar 12, 2019 at 01:04:58PM -0400, Mimi Zohar wrote: > > On Mon, 2019-03-11 at 16:54 -0700, Calvin Owens wrote: > > > We're having lots of problems with TPM commands timing out, and we're > > > seeing these problems across lots of different hardware (both v1/v2). > > > > > > I instrumented the driver to collect latency data, but I wasn't able to > > > find any specific timeout to fix: it seems like many of them are too > > > aggressive. So I tried replacing all the timeout logic with a single > > > universal long timeout, and found that makes our TPMs 100% reliable. > > > > > > Given that this timeout logic is very complex, problematic, and appears > > > to serve no real purpose, I propose simply deleting all of it. > > > > Normally before sending such a massive change like this, included in > > the bug report or patch description, there would be some indication as > > to which kernel introduced a regression. Has this always been a > > problem? Is this something new? How new? > > Also: is the problem in timeouts, durations or both. Does make sense > to fix something that isn't broken... And maybe the fix is a too big hammer. We could possibly just decrease the granularity but fully take it away. /Jarkko