On Mon, Mar 11, 2019 at 05:27:43PM -0700, James Bottomley wrote: > On Mon, 2019-03-11 at 16:54 -0700, Calvin Owens wrote: > > e're having lots of problems with TPM commands timing out, and we're > > seeing these problems across lots of different hardware (both v1/v2). > > > > I instrumented the driver to collect latency data, but I wasn't able > > to find any specific timeout to fix: it seems like many of them are > > too aggressive. So I tried replacing all the timeout logic with a > > single universal long timeout, and found that makes our TPMs 100% > > reliable. > > > > Given that this timeout logic is very complex, problematic, and > > appears to serve no real purpose, I propose simply deleting all of > > it. > > "no real purpose" is a bit strong given that all these timeouts are > standards mandated. The purpose stated by the standards is that there > needs to be a way of differentiating the TPM crashed from the TPM is > taking a very long time to respond. For a normally functioning TPM it > looks complex and unnecessary, but for a malfunctioning one it's a > lifesaver. Standards should be only followed when they make practical sense and ignored when not. The range is only up to 2s anyway. /Jarkko