Re: [PATCH v2] tpm: Fix suspend/shutdown on some boards by preserving chip Locality

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jarkko,

I have an update here. I would like you to check if it makes sense
before I submit a patch.

The problem might be related to the chip itself which leaves the idle
state whenever the locality is relinquished.

I probed the chip while operating and noted that the
TPM_CRB_CTRL_STS_0.tpmIdle bit (located at `regs_t->ctrl_sts` on
`crb_priv` structure in tpm_crb.c) is always cleared whenever the
locality is relinquished.

The result is that this chip never becomes idle since after going idle
the locality is relinquished (`tpm_chip_stop` function in tpm-chip.c).

I patched the 7.8.0 kernel for bypassing the `tpm_relinquish_locality`
call and so far the PC is suspending and power off normally,
TPM_CRB_CTRL_STS_0.tpmIdle remains set whenever the device is not used
now.

I believe this is related to this specific chip, if the fix remains
working for 2 weeks I will submit the patch. Please advise me if it
makes sense.

Best regards,
Adam

Em qua., 13 de mar. de 2024 às 14:02, Adam Alves <adamoa@xxxxxxxxx> escreveu:
>
> Hi Jarkko,
>
> Thank you very much for kindly reviewing this proposal.
>
> After one week without any issues with my PC hanging, it happened
> again. It seems that the fix I am proposing is not final (it only
> reduced the frequency since it always happened when I shutdown after
> couple hours of power up time and now it only happened after two weeks
> with a similar usage rate).
>
> I will share with you the data you requested below.
>
> > The lacking information here is the CPU model (/proc/cpuinfo), on
> > which kernel version the bug was produced and what kind of TPM the
> > system has (discrete chip or firmware TPM should be easy to check
> > from BIOS).
>
> CPU model: Intel(R) Core(TM) i7-10700F CPU @ 2.90GHz
> I am attaching data from /proc/cpuinfo
>
> TPM: No info on mainboard documentation regarding TPM. BIOS is not
> clear whether or not it is discrete or firmware. Based on dmidecode
> (attached) I get the following:
> TPM Device
>         Vendor ID: INTC
>         Specification Version: 2.0
>         Firmware Revision: 500.16
>         Description: INTEL
>         Characteristics:
>                 Family configurable via platform software support
>         OEM-specific Information: 0x00000000
> I also extracted TPM_CRB_INTF_ID_0 from the TPM: `a13a808600084311`
> (Vendor ID 8086, Device ID a13a, Revision ID 00). The only match I
> found while browsing for this device ID is 100 Series/C230 Series
> Chipset Family MEI Controller #1, which is a PCI device, so it might
> not be related to the TPM.
>
> The driver bound to the tpm0 device is tpm_crb. The disassembled TPM2
> ACPI table is also attached in case it helps.
>
> The bug was reproduced from upstream kernel version 6.8.0 (attached
> build .config that I used).
>
> > Also, which firmwre version you have and have you tested with the
> > most up to date firmware (BIOS)?
>
> I have the most updated firmware provided by ASUS: TUF GAMING
> B460M-PLUS BIOS 1601
>
> > What is "the ACPI command"? Refer to concrete items instead of
> > asking to guess what you is the ACPI command for you.
>
> I enabled ACPI_DEBUG on my kernel to know where the it was actually
> hanging. The last function is actually the last function that should
> be called by the kernel for a successful shutdown:
> hwsleep-0078 hw_legacy_sleep       : Entering sleep state [S5]^M
> hwregs-0460 hw_write_pm1_control  : ----Entry^M
> hwvalid-0097 hw_validate_io_request: ----Entry^M
> hwvalid-0111 hw_validate_io_request: Address 0000000000001804
> LastAddress 0000000000001805 Length 2  hwvalid-0128
> hw_validate_io_request: ----Exit- AE_OK^M
> hwregs-0360 hw_write              : Wrote: 0000000000001C01 width 16
> to 0000000000001804 (SystemIO)^M
> hwregs-0473 hw_write_pm1_control  : ----Exit- AE_OK^M
> hwregs-0460 hw_write_pm1_control  : ----Entry^M
> hwvalid-0097 hw_validate_io_request: ----Entry^M
> hwvalid-0111 hw_validate_io_request: Address 0000000000001804
> LastAddress 0000000000001805 Length 2  hwvalid-0128
> hw_validate_io_request: ----Exit- AE_OK^M
>
> It is writing both SLP_TYP + SLP_EN to ACPI PM1b_CNT registers (as
> expected by specification). I checked the flags and it is in line with
> the flags required by the system ACPI tables.
>
> I understand from that that the system is hanging after ACPI firmware
> takes over. The same issue happens if I force a EFI shutdown.
>
> Since the the BUG has appeared again even with the fix implemented, I
> am holding this patch for now until I find a solution that permanently
> fixes the issue. The next time I try to submit a patch that will
> comply with all your suggestions, thank you very much.
>
> I would appreciate if you had any hint on how I could keep digging to
> find the issue that might be causing this bug. This is an issue that
> only happens when I shutdown from Linux and my TPM is activated in
> BIOS. That's why my guess is that this is what should be causing it.
> From Windows, shutdown is always flawless.
>
> Best regards,
> Adam
>
> >
> > > chip expecting it to be in Locality 0 as expected by TCG PC Client
> > > Platform Firmware Profile Version 1.06 Revision 52 (3.1.1 – Pre-OS
> > > Environment) and then when it fails to do so it simply halts the
> > > whole system.
> >
> > We don't speculate about the root cause here, only document it.
> > Please move this paragraph before diffstat (see below)>
> >
> > > Enable a user to configure the kernel through
> > > “tpm.locality_on_suspend=1” boot parameter so that the locality is set
> > > before suspend/shutdown in order to diagnose whether or not the board is
> > > one of the buggy ones that require this workaround. Since this bug is
> > > related to the board/platform instead of the specific TPM chip, call
> > > dmi_check_system on the tpm_init function so that this setting is
> > > automatically enabled for boards specified in code (ASUS TUF GAMING
> > > B460M-PLUS already included) – automatic configuration only works in
> > > case CONFIG_DMI is set though, since dmi_check_system is a non-op when
> > > CONFIG_DMI is not set.
> >
> > Please describe what the *kernel command-line" (for clarity
> > sake) semantically means.
> >
> > Also please remove anything about diangnosing. We care only
> > about fixes.
> >
> > >
> > > In case “tpm.locality_on_suspend=0” (the default) don't change any
> > > behavior thus preserving current functionality of any other board
> > > except ASUSTeK COMPUTER INC. TUF GAMING B460M-PLUS and possibly future
> > > boards as we successfully diagnose other boards with the same issue
> > > fixed by using “tpm.locality_on_suspend=1”.
> >
> > This neither documents the default value. I'm also lost did setting
> > this "1" or "0" fix the issue in your case?
> >
> > So: firmware version and being up-to-date is important and also this
> > needs to be reproduciable with the mainline Linux tree, not distro
> > kernel or custom kernel.
> >
> > >
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217890
> > > Signed-off-by: Adam Alves <adamoa@xxxxxxxxx>
> > > ---
> >
> > <cover letter>
> >
> > OK, I'll try to check what is done here but please re-read
> > "describing your changes" before sending next version:
> >
> > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
> >
> > > v1->v2: fix formatting issues and simplified tpm_chip_stop code.
> > >
> > >  drivers/char/tpm/tpm-chip.c      | 12 +++++++++++
> > >  drivers/char/tpm/tpm-interface.c | 37 ++++++++++++++++++++++++++++++++
> > >  drivers/char/tpm/tpm.h           |  1 +
> > >  include/linux/tpm.h              |  1 +
> > >  4 files changed, 51 insertions(+)
> > >
> > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> > > index 42b1062e33cd..a183e1355289 100644
> > > --- a/drivers/char/tpm/tpm-chip.c
> > > +++ b/drivers/char/tpm/tpm-chip.c
> > > @@ -137,6 +137,12 @@ EXPORT_SYMBOL_GPL(tpm_chip_start);
> > >   */
> > >  void tpm_chip_stop(struct tpm_chip *chip)
> > >  {
> > > +     if (chip->flags & TPM_CHIP_FLAG_PRESERVE_LOCALITY) {
> >
> > The commit message did not explain what this flag is and what is its
> > purpose.
> >
> > Also why you need to populate global flag inside chip, or the value
> > of it?
> >
> > Why this is not just:
> >
> >         if (tpm_locality_on_suspend) {
> > ?
> >
> >
> > > +             if (chip->locality != 0)
> > > +                     tpm_request_locality(chip);
> >
> > This will unconditionally skip calling tpm_request_locality() because
> > Linux only uses locality 0. Not sure what good does this make.
> >
> > > +             return;
> > > +     }
> > > +
> > >       tpm_go_idle(chip);
> > >       tpm_relinquish_locality(chip);
> > >       tpm_clk_disable(chip);
> > > @@ -291,6 +297,9 @@ int tpm_class_shutdown(struct device *dev)
> > >  {
> > >       struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev);
> > >
> > > +     if (tpm_locality_on_suspend)
> > > +             chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY;
> > > +
> > >       down_write(&chip->ops_sem);
> > >       if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> > >               if (!tpm_chip_start(chip)) {
> > > @@ -668,6 +677,9 @@ EXPORT_SYMBOL_GPL(tpm_chip_register);
> > >   */
> > >  void tpm_chip_unregister(struct tpm_chip *chip)
> > >  {
> > > +     if (tpm_locality_on_suspend)
> > > +             chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY;
> > > +
> > >       tpm_del_legacy_sysfs(chip);
> > >       if (tpm_is_hwrng_enabled(chip))
> > >               hwrng_unregister(&chip->hwrng);
> > > diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
> > > index 66b16d26eecc..7f770ea98402 100644
> > > --- a/drivers/char/tpm/tpm-interface.c
> > > +++ b/drivers/char/tpm/tpm-interface.c
> > > @@ -26,6 +26,7 @@
> > >  #include <linux/suspend.h>
> > >  #include <linux/freezer.h>
> > >  #include <linux/tpm_eventlog.h>
> > > +#include <linux/dmi.h>
> > >
> > >  #include "tpm.h"
> > >
> > > @@ -382,6 +383,36 @@ int tpm_auto_startup(struct tpm_chip *chip)
> > >       return rc;
> > >  }
> > >
> > > +/*
> > > + * Bug workaround - some boards expect the TPM to be on Locality 0
> > > + * before suspend/shutdown, halting the system otherwise before
> > > + * suspend and shutdown. Change suspend behavior for these cases.
> > > + */
> > > +bool tpm_locality_on_suspend;
> > > +module_param_named(locality_on_suspend, tpm_locality_on_suspend, bool, 0644);
> > > +MODULE_PARM_DESC(locality_on_suspend,
> > > +              "Put TPM at locality 0 before suspend/shutdown.");
> > > +
> > > +static int __init tpm_set_locality_on_suspend(const struct dmi_system_id *system_id)
> > > +{
> > > +     pr_info("Board %s: TPM locality preserved before suspend/shutdown.\n",
> > > +             system_id->ident);
> >
> > Please remove pr_info(), we do not want to bloat klog.
> >
> > > +     tpm_locality_on_suspend = true;
> > > +
> > > +     return 0;
> > > +}
> > > +
> > > +static const struct dmi_system_id tpm_board_quirks[] __initconst = {
> >
> > The commit message did not introduce this. Also should have inline
> > documentation.
> >
> > /*
> >  * What the heck this.
> >  */
> >
> > > +     {
> > > +             .ident = "TUF GAMING B460M-PLUS",
> > > +             .matches = {
> > > +                     DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK COMPUTER INC."),
> > > +                     DMI_MATCH(DMI_BOARD_NAME, "TUF GAMING B460M-PLUS"),
> > > +             },
> > > +             .callback = tpm_set_locality_on_suspend,
> > > +     },
> > > +};
> > > +
> > >  /*
> > >   * We are about to suspend. Save the TPM state
> > >   * so that it can be restored.
> > > @@ -394,6 +425,9 @@ int tpm_pm_suspend(struct device *dev)
> > >       if (!chip)
> > >               return -ENODEV;
> > >
> > > +     if (tpm_locality_on_suspend)
> > > +             chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY;
> > > +
> > >       if (chip->flags & TPM_CHIP_FLAG_ALWAYS_POWERED)
> > >               goto suspended;
> > >
> > > @@ -431,6 +465,7 @@ int tpm_pm_resume(struct device *dev)
> > >       if (chip == NULL)
> > >               return -ENODEV;
> > >
> > > +     chip->flags &= ~TPM_CHIP_FLAG_PRESERVE_LOCALITY;
> > >       chip->flags &= ~TPM_CHIP_FLAG_SUSPENDED;
> > >
> > >       /*
> > > @@ -476,6 +511,8 @@ static int __init tpm_init(void)
> > >  {
> > >       int rc;
> > >
> > > +     dmi_check_system(tpm_board_quirks);
> > > +
> > >       rc = class_register(&tpm_class);
> > >       if (rc) {
> > >               pr_err("couldn't create tpm class\n");
> > > diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> > > index 61445f1dc46d..f2657b611b81 100644
> > > --- a/drivers/char/tpm/tpm.h
> > > +++ b/drivers/char/tpm/tpm.h
> > > @@ -236,6 +236,7 @@ extern dev_t tpm_devt;
> > >  extern const struct file_operations tpm_fops;
> > >  extern const struct file_operations tpmrm_fops;
> > >  extern struct idr dev_nums_idr;
> > > +extern bool tpm_locality_on_suspend;
> > >
> > >  ssize_t tpm_transmit(struct tpm_chip *chip, u8 *buf, size_t bufsiz);
> > >  int tpm_get_timeouts(struct tpm_chip *);
> > > diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> > > index 4ee9d13749ad..1fbb33f386d1 100644
> > > --- a/include/linux/tpm.h
> > > +++ b/include/linux/tpm.h
> > > @@ -284,6 +284,7 @@ enum tpm_chip_flags {
> > >       TPM_CHIP_FLAG_FIRMWARE_UPGRADE          = BIT(7),
> > >       TPM_CHIP_FLAG_SUSPENDED                 = BIT(8),
> > >       TPM_CHIP_FLAG_HWRNG_DISABLED            = BIT(9),
> > > +     TPM_CHIP_FLAG_PRESERVE_LOCALITY         = BIT(10),
> > >  };
> > >
> > >  #define to_tpm_chip(d) container_of(d, struct tpm_chip, dev)
> >
> >
> > BR, Jarkko
>
>
>
> --
> Adam Oliveira Alves



-- 
Adam Oliveira Alves





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux