Re: [PATCH v3] tpm: fix intermittent failure with self tests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I get merge conflicts in my tree but I'll review this.

On Mon, 2018-02-19 at 07:24 -0500, James Bottomley wrote:
> Ever since
> 
> commit 2482b1bba5122b1d5516c909832bdd282015b8e9
> Author: Alexander Steffen <Alexander.Steffen@xxxxxxxxxxxx>
> Date:   Thu Aug 31 19:18:56 2017 +0200
> 
>     tpm: Trigger only missing TPM 2.0 self tests
> 
> My Nuvoton 6xx in a Dell XPS-13 has been intermittently failing to
> work (necessitating a reboot). The problem seems to be that the TPM
> gets into a state where the partial self-test doesn't return
> TPM_RC_SUCCESS (meaning all tests have run to completion), but instead
> returns TPM_RC_TESTING (meaning some tests are still running in the
> background).  There are various theories that resending the self-test
> command actually causes the tests to restart and thus triggers more
> TPM_RC_TESTING returns until the timeout is exceeded.
> 
> There are several issues here: firstly being we shouldn't slow down
> the boot sequence waiting for the self test to complete once the TPM
> backgrounds them.  It will actually make available all functions that
> have passed and if it gets a failure return TPM_RC_FAILURE to every
> subsequent command.  So the fix is to kick off self tests once and if
> they return TPM_RC_TESTING log that as a backgrounded self test and
> continue on.  In order to prevent other tpm users from seeing any
> TPM_RC_TESTING returns (which it might if they send a command that
> needs a TPM subsystem which is still under test), we loop in
> tpm_transmit_cmd until either a timeout or we don't get a
> TPM_RC_TESTING return.
> 
> Finally, there have been observations of strange returns from a
> partial test. One Nuvoton is occasionally returning
> TPM_RC_COMMAND_CODE, so treat any unexpected return from a partial
> self test as an indication we need to run a full self test.
> 
> Signed-off-by: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
> Fixes: 2482b1bba5122b1d5516c909832bdd282015b8e9
> 
> ---
> v3: - must pass through TPM2_RC_INTIALIZE
> v2: - review feedback: encapsulate transmission and rename NOWAIT constant
>     - Rewrite selftest routine to run full test on any unexpected
>       (meaning not TPM2_RC_TESTING, TPM2_RC_SUCCESS or
>       TPM2_RC_FAILURE) from the partial selftest.
> ---
>  drivers/char/tpm/tpm-interface.c | 37 ++++++++++++++++++++++++++++-
>  drivers/char/tpm/tpm.h           |  6 +++--
>  drivers/char/tpm/tpm2-cmd.c      | 50 ++++++++++++++++++++++++++++++---------
> -
>  3 files changed, 78 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-
> interface.c
> index 9e80a953d693..b9a8914b05d9 100644
> --- a/drivers/char/tpm/tpm-interface.c
> +++ b/drivers/char/tpm/tpm-interface.c
> @@ -515,6 +515,41 @@ ssize_t tpm_transmit(struct tpm_chip *chip, struct
> tpm_space *space,
>  	return rc ? rc : len;
>  }
>  
> +static ssize_t tpm_transmit_check(struct tpm_chip *chip,
> +				  struct tpm_space *space, const void *buf,
> +				  size_t bufsiz, unsigned int flags)

I would like this function to be renamed as tpm_transmit() and the
current tpm_transmit() as tpm_try_transmit(). That would be a more
understandable naming. I think also that both functions would
deserve documentation because the behavior is starting to become
fairly complex.

> +{
> +	const struct tpm_output_header *header = buf;
> +	unsigned int delay_msec = 20;
> +	ssize_t len;
> +	u32 err;
> +
> +	/*
> +	 * on first probe we kick off a TPM self test in the
> +	 * background This means the TPM may return RC_TESTING to any
> +	 * command that tries to use a subsystem under test, so do an
> +	 * exponential backoff wait if that happens
> +	 */

This could be part of kdoc.

> +	for (;;) {
> +		len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
> +		if (len <  0)
> +			return len;
> +
> +		err = be32_to_cpu(header->return_code);
> +		if (err != TPM2_RC_TESTING ||
> +		    (flags & TPM_TRANSMIT_NO_WAIT_TESTING))
> +			break;
> +
> +		delay_msec *= 2;
> +		if (delay_msec > TPM2_DURATION_LONG) {
> +			dev_err(&chip->dev,"TPM: still running self tests,
> giving up waiting\n");

's/TPM://'

> +			break;
> +		}
> +		tpm_msleep(delay_msec);
> +	}
> +	return len;
> +}
> +
>  /**
>   * tmp_transmit_cmd - send a tpm command to the device
>   *    The function extracts tpm out header return code
> @@ -540,7 +575,7 @@ ssize_t tpm_transmit_cmd(struct tpm_chip *chip, struct
> tpm_space *space,
>  	int err;
>  	ssize_t len;
>  
> -	len = tpm_transmit(chip, space, (u8 *)buf, bufsiz, flags);
> +	len = tpm_transmit_check(chip, space, buf, bufsiz, flags);
>  	if (len <  0)
>  		return len;
>  
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index f895fba4e20d..3e083a30a108 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -104,6 +104,7 @@ enum tpm2_return_codes {
>  	TPM2_RC_HASH		= 0x0083, /* RC_FMT1 */
>  	TPM2_RC_HANDLE		= 0x008B,
>  	TPM2_RC_INITIALIZE	= 0x0100, /* RC_VER1 */
> +	TPM2_RC_FAILURE		= 0x0101,
>  	TPM2_RC_DISABLED	= 0x0120,
>  	TPM2_RC_COMMAND_CODE    = 0x0143,
>  	TPM2_RC_TESTING		= 0x090A, /* RC_WARN */
> @@ -499,8 +500,9 @@ extern const struct file_operations tpmrm_fops;
>  extern struct idr dev_nums_idr;
>  
>  enum tpm_transmit_flags {
> -	TPM_TRANSMIT_UNLOCKED	= BIT(0),
> -	TPM_TRANSMIT_RAW	= BIT(1),
> +	TPM_TRANSMIT_UNLOCKED		= BIT(0),
> +	TPM_TRANSMIT_RAW		= BIT(1),
> +	TPM_TRANSMIT_NO_WAIT_TESTING	= BIT(2),
>  };
>  
>  ssize_t tpm_transmit(struct tpm_chip *chip, struct tpm_space *space,
> diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
> index e5052e2f1924..13d9e74084aa 100644
> --- a/drivers/char/tpm/tpm2-cmd.c
> +++ b/drivers/char/tpm/tpm2-cmd.c
> @@ -838,29 +838,55 @@ EXPORT_SYMBOL_GPL(tpm2_calc_ordinal_duration);
>  static int tpm2_do_selftest(struct tpm_chip *chip)
>  {
>  	int rc;
> -	unsigned int delay_msec = 10;
> -	long duration;
>  	struct tpm_buf buf;
> +	int full_test;
>  
> -	duration = jiffies_to_msecs(
> -		tpm2_calc_ordinal_duration(chip, TPM2_CC_SELF_TEST));
> +	for (full_test = 0; full_test < 2; full_test++) {
> +		const char *test_str = full_test ?
> +			"full selftest" : "incremental selftest";

This code is too "clever" without much of a real purpose. A better
idea would be to have a helper function for sending the self-test
command and just call it twice.

>  
> -	for (;;) {
>  		rc = tpm_buf_init(&buf, TPM2_ST_NO_SESSIONS,
> TPM2_CC_SELF_TEST);
>  		if (rc)
>  			return rc;
>  
> -		/* Perform tests in the background. */
> -		tpm_buf_append_u8(&buf, 0);
> +		tpm_buf_append_u8(&buf, full_test);
> +		dev_info(&chip->dev, "TPM: running %s\n", test_str);
>  		rc = tpm_transmit_cmd(chip, NULL, buf.data, PAGE_SIZE, 0, 0,
> -				      "continue selftest");
> +				      test_str);
>  		tpm_buf_destroy(&buf);
> -		if (rc != TPM2_RC_TESTING || delay_msec >= duration)
> +
> +		if (rc == TPM2_RC_INITIALIZE)
> +			/* TPM needs startup command */
> +			return rc;
> +
> +		if (rc == TPM2_RC_TESTING) {
> +			/*
> +			 * A return of RC_TESTING means the TPM is still
> +			 * running self tests.  If one fails it will go into
> +			 * failure mode and return RC_FAILED to every
> command,
> +			 * so treat a still in testing return as a success
> +			 * rather than causing a driver detach.
> +			 */
> +			dev_info(&chip->dev, "TPM: Running self test in
> background\n");
> +			rc = TPM2_RC_SUCCESS;
> +		}
> +		if (rc == TPM2_RC_SUCCESS)
>  			break;
>  
> -		/* wait longer than before */
> -		delay_msec *= 2;
> -		tpm_msleep(delay_msec);
> +		dev_info(&chip->dev, "TPM: %s failed\n", test_str);
> +	}
> +
> +	if (rc == TPM2_RC_SUCCESS) {
> +		dev_info(&chip->dev, "TPM: selftest succeeded\n");
> +	} else if (rc == TPM2_RC_FAILURE) {
> +		dev_err(&chip->dev, "TPM: selftest failed\n");
> +	} else {
> +		/*
> +		 * Any other error code is unexpected but should not
> +		 * prevent the driver from attaching
> +		 */
> +		dev_err(&chip->dev, "TPM: selftest returns 0x%x, continuing
> anyway\n", rc);
> +		rc = TPM2_RC_SUCCESS;
>  	}

Are these messages mandatory given that tpm_transmit_cmd() already
prints an error message? Could some or all of them removed?

Some other notes:

* tpm_transmit_cmd() uses %d as formatting while this uses %x
* Here it is called "selftest" and in the messages above it is
  called "self test".

/Jarkko



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux