Re: [PATCH] tpm_tis: Add missing start/stop_tpm_chip calls

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Sat, 30 Jan 2021 20:18:27 -0800

On Sat, 2021-01-30 at 19:36 -0800, Guenter Roeck wrote:
> On 1/30/21 4:41 PM, James Bottomley wrote:
> > On Sat, 2021-01-30 at 15:49 -0800, Guenter Roeck wrote:
> > > On 1/29/21 2:59 PM, Jarkko Sakkinen wrote:
> > > > On Tue, Jan 26, 2021 at 04:46:07PM +0100, Łukasz Majczak wrote:
> > > > > Hi Jarkko, Guenter
> > > > > 
> > > > > Yes, here are the logs when failure occurs -
> > > > > https://gist.github.com/semihalf-majczak-lukasz/1575461f585f1e7fb1e9366b8eceaab9
> > > > > Look for a phrase "TPM returned invalid status"
> > > > > 
> > > > > Guenter - good suggestion - I will try to keep it as tight as
> > > > > possible.
> > > > > 
> > > > > Best regards,
> > > > > Lukasz
> > > > 
> > > > Is it possible for you try out with linux-next? Thanks. It's a
> > > > known issue, which ought to be fixed by now.
> > > > 
> > > > The log message is harmless, it'a warning not panic, and does
> > > > not endanger system stability. WARN()'s always dump stack
> > > > trace. No oops is happening.
> > > > 
> > > 
> > > There is a note in the kernel documentation which states:
> > > 
> > > Note that the WARN()-family should only be used for "expected to
> > > be unreachable" situations. If you want to warn about "reachable
> > > but undesirable" situations, please use the pr_warn()-family of
> > > functions.
> > 
> > It fits the definition.  The warning only triggers if the access is
> > in the wrong locality, which should be impossible, so the warning
> > should be unreachable.
> > 
> Thanks a lot for the clarification. So a warning traceback in the
> kernel doesn't necessarily suggest that there is a serious problem
> that should be fixed; it only means that some code is executed which
> should not be reachable (but is otherwise harmless).
> 
> That makes me wonder, though, if it would make sense to mark such
> harmless tracebacks differently. The terms "warning" and "harmless"
> sound like a bit of a contradiction to me (especially for systems
> where panic_on_warn is set).

Well, it's not harmless; because it occurs at start of day, it means we
clear the ineffective command and use default values and those happen
to work fine for the TPM in question, so the problem is pretty much
covered up.  If it had occurred anywhere else it would result in a loss
of the command data with unknown ramifications to user space, possibly
leading to a TPM failure.

Hopefully this means this is the only place we screwed up, but you can
see why a scary warning and stack trace is appropriate: if it triggers,
something in the kernel violated the TPM command model.

James