Re: [PATCH v2] tpm: Fix suspend/shutdown on some boards by preserving chip Locality

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fixing a typo: I patched the 6.8.0 kernel.

Em qui., 14 de mar. de 2024 às 13:31, Adam Alves <adamoa@xxxxxxxxx> escreveu:
>
> Hi Jarkko,
>
> I have an update here. I would like you to check if it makes sense
> before I submit a patch.
>
> The problem might be related to the chip itself which leaves the idle
> state whenever the locality is relinquished.
>
> I probed the chip while operating and noted that the
> TPM_CRB_CTRL_STS_0.tpmIdle bit (located at `regs_t->ctrl_sts` on
> `crb_priv` structure in tpm_crb.c) is always cleared whenever the
> locality is relinquished.
>
> The result is that this chip never becomes idle since after going idle
> the locality is relinquished (`tpm_chip_stop` function in tpm-chip.c).
>
> I patched the 7.8.0 kernel for bypassing the `tpm_relinquish_locality`
> call and so far the PC is suspending and power off normally,
> TPM_CRB_CTRL_STS_0.tpmIdle remains set whenever the device is not used
> now.
>
> I believe this is related to this specific chip, if the fix remains
> working for 2 weeks I will submit the patch. Please advise me if it
> makes sense.
>
> Best regards,
> Adam
>
> Em qua., 13 de mar. de 2024 às 14:02, Adam Alves <adamoa@xxxxxxxxx> escreveu:
> >
> > Hi Jarkko,
> >
> > Thank you very much for kindly reviewing this proposal.
> >
> > After one week without any issues with my PC hanging, it happened
> > again. It seems that the fix I am proposing is not final (it only
> > reduced the frequency since it always happened when I shutdown after
> > couple hours of power up time and now it only happened after two weeks
> > with a similar usage rate).
> >
> > I will share with you the data you requested below.
> >
> > > The lacking information here is the CPU model (/proc/cpuinfo), on
> > > which kernel version the bug was produced and what kind of TPM the
> > > system has (discrete chip or firmware TPM should be easy to check
> > > from BIOS).
> >
> > CPU model: Intel(R) Core(TM) i7-10700F CPU @ 2.90GHz
> > I am attaching data from /proc/cpuinfo
> >
> > TPM: No info on mainboard documentation regarding TPM. BIOS is not
> > clear whether or not it is discrete or firmware. Based on dmidecode
> > (attached) I get the following:
> > TPM Device
> >         Vendor ID: INTC
> >         Specification Version: 2.0
> >         Firmware Revision: 500.16
> >         Description: INTEL
> >         Characteristics:
> >                 Family configurable via platform software support
> >         OEM-specific Information: 0x00000000
> > I also extracted TPM_CRB_INTF_ID_0 from the TPM: `a13a808600084311`
> > (Vendor ID 8086, Device ID a13a, Revision ID 00). The only match I
> > found while browsing for this device ID is 100 Series/C230 Series
> > Chipset Family MEI Controller #1, which is a PCI device, so it might
> > not be related to the TPM.
> >
> > The driver bound to the tpm0 device is tpm_crb. The disassembled TPM2
> > ACPI table is also attached in case it helps.
> >
> > The bug was reproduced from upstream kernel version 6.8.0 (attached
> > build .config that I used).
> >
> > > Also, which firmwre version you have and have you tested with the
> > > most up to date firmware (BIOS)?
> >
> > I have the most updated firmware provided by ASUS: TUF GAMING
> > B460M-PLUS BIOS 1601
> >
> > > What is "the ACPI command"? Refer to concrete items instead of
> > > asking to guess what you is the ACPI command for you.
> >
> > I enabled ACPI_DEBUG on my kernel to know where the it was actually
> > hanging. The last function is actually the last function that should
> > be called by the kernel for a successful shutdown:
> > hwsleep-0078 hw_legacy_sleep       : Entering sleep state [S5]^M
> > hwregs-0460 hw_write_pm1_control  : ----Entry^M
> > hwvalid-0097 hw_validate_io_request: ----Entry^M
> > hwvalid-0111 hw_validate_io_request: Address 0000000000001804
> > LastAddress 0000000000001805 Length 2  hwvalid-0128
> > hw_validate_io_request: ----Exit- AE_OK^M
> > hwregs-0360 hw_write              : Wrote: 0000000000001C01 width 16
> > to 0000000000001804 (SystemIO)^M
> > hwregs-0473 hw_write_pm1_control  : ----Exit- AE_OK^M
> > hwregs-0460 hw_write_pm1_control  : ----Entry^M
> > hwvalid-0097 hw_validate_io_request: ----Entry^M
> > hwvalid-0111 hw_validate_io_request: Address 0000000000001804
> > LastAddress 0000000000001805 Length 2  hwvalid-0128
> > hw_validate_io_request: ----Exit- AE_OK^M
> >
> > It is writing both SLP_TYP + SLP_EN to ACPI PM1b_CNT registers (as
> > expected by specification). I checked the flags and it is in line with
> > the flags required by the system ACPI tables.
> >
> > I understand from that that the system is hanging after ACPI firmware
> > takes over. The same issue happens if I force a EFI shutdown.
> >
> > Since the the BUG has appeared again even with the fix implemented, I
> > am holding this patch for now until I find a solution that permanently
> > fixes the issue. The next time I try to submit a patch that will
> > comply with all your suggestions, thank you very much.
> >
> > I would appreciate if you had any hint on how I could keep digging to
> > find the issue that might be causing this bug. This is an issue that
> > only happens when I shutdown from Linux and my TPM is activated in
> > BIOS. That's why my guess is that this is what should be causing it.
> > From Windows, shutdown is always flawless.
> >
> > Best regards,
> > Adam
> >
> > >
> > > > chip expecting it to be in Locality 0 as expected by TCG PC Client
> > > > Platform Firmware Profile Version 1.06 Revision 52 (3.1.1 – Pre-OS
> > > > Environment) and then when it fails to do so it simply halts the
> > > > whole system.
> > >
> > > We don't speculate about the root cause here, only document it.
> > > Please move this paragraph before diffstat (see below)>
> > >
> > > > Enable a user to configure the kernel through
> > > > “tpm.locality_on_suspend=1” boot parameter so that the locality is set
> > > > before suspend/shutdown in order to diagnose whether or not the board is
> > > > one of the buggy ones that require this workaround. Since this bug is
> > > > related to the board/platform instead of the specific TPM chip, call
> > > > dmi_check_system on the tpm_init function so that this setting is
> > > > automatically enabled for boards specified in code (ASUS TUF GAMING
> > > > B460M-PLUS already included) – automatic configuration only works in
> > > > case CONFIG_DMI is set though, since dmi_check_system is a non-op when
> > > > CONFIG_DMI is not set.
> > >
> > > Please describe what the *kernel command-line" (for clarity
> > > sake) semantically means.
> > >
> > > Also please remove anything about diangnosing. We care only
> > > about fixes.
> > >
> > > >
> > > > In case “tpm.locality_on_suspend=0” (the default) don't change any
> > > > behavior thus preserving current functionality of any other board
> > > > except ASUSTeK COMPUTER INC. TUF GAMING B460M-PLUS and possibly future
> > > > boards as we successfully diagnose other boards with the same issue
> > > > fixed by using “tpm.locality_on_suspend=1”.
> > >
> > > This neither documents the default value. I'm also lost did setting
> > > this "1" or "0" fix the issue in your case?
> > >
> > > So: firmware version and being up-to-date is important and also this
> > > needs to be reproduciable with the mainline Linux tree, not distro
> > > kernel or custom kernel.
> > >
> > > >
> > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217890
> > > > Signed-off-by: Adam Alves <adamoa@xxxxxxxxx>
> > > > ---
> > >
> > > <cover letter>
> > >
> > > OK, I'll try to check what is done here but please re-read
> > > "describing your changes" before sending next version:
> > >
> > > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
> > >
> > > > v1->v2: fix formatting issues and simplified tpm_chip_stop code.
> > > >
> > > >  drivers/char/tpm/tpm-chip.c      | 12 +++++++++++
> > > >  drivers/char/tpm/tpm-interface.c | 37 ++++++++++++++++++++++++++++++++
> > > >  drivers/char/tpm/tpm.h           |  1 +
> > > >  include/linux/tpm.h              |  1 +
> > > >  4 files changed, 51 insertions(+)
> > > >
> > > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> > > > index 42b1062e33cd..a183e1355289 100644
> > > > --- a/drivers/char/tpm/tpm-chip.c
> > > > +++ b/drivers/char/tpm/tpm-chip.c
> > > > @@ -137,6 +137,12 @@ EXPORT_SYMBOL_GPL(tpm_chip_start);
> > > >   */
> > > >  void tpm_chip_stop(struct tpm_chip *chip)
> > > >  {
> > > > +     if (chip->flags & TPM_CHIP_FLAG_PRESERVE_LOCALITY) {
> > >
> > > The commit message did not explain what this flag is and what is its
> > > purpose.
> > >
> > > Also why you need to populate global flag inside chip, or the value
> > > of it?
> > >
> > > Why this is not just:
> > >
> > >         if (tpm_locality_on_suspend) {
> > > ?
> > >
> > >
> > > > +             if (chip->locality != 0)
> > > > +                     tpm_request_locality(chip);
> > >
> > > This will unconditionally skip calling tpm_request_locality() because
> > > Linux only uses locality 0. Not sure what good does this make.
> > >
> > > > +             return;
> > > > +     }
> > > > +
> > > >       tpm_go_idle(chip);
> > > >       tpm_relinquish_locality(chip);
> > > >       tpm_clk_disable(chip);
> > > > @@ -291,6 +297,9 @@ int tpm_class_shutdown(struct device *dev)
> > > >  {
> > > >       struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev);
> > > >
> > > > +     if (tpm_locality_on_suspend)
> > > > +             chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY;
> > > > +
> > > >       down_write(&chip->ops_sem);
> > > >       if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> > > >               if (!tpm_chip_start(chip)) {
> > > > @@ -668,6 +677,9 @@ EXPORT_SYMBOL_GPL(tpm_chip_register);
> > > >   */
> > > >  void tpm_chip_unregister(struct tpm_chip *chip)
> > > >  {
> > > > +     if (tpm_locality_on_suspend)
> > > > +             chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY;
> > > > +
> > > >       tpm_del_legacy_sysfs(chip);
> > > >       if (tpm_is_hwrng_enabled(chip))
> > > >               hwrng_unregister(&chip->hwrng);
> > > > diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
> > > > index 66b16d26eecc..7f770ea98402 100644
> > > > --- a/drivers/char/tpm/tpm-interface.c
> > > > +++ b/drivers/char/tpm/tpm-interface.c
> > > > @@ -26,6 +26,7 @@
> > > >  #include <linux/suspend.h>
> > > >  #include <linux/freezer.h>
> > > >  #include <linux/tpm_eventlog.h>
> > > > +#include <linux/dmi.h>
> > > >
> > > >  #include "tpm.h"
> > > >
> > > > @@ -382,6 +383,36 @@ int tpm_auto_startup(struct tpm_chip *chip)
> > > >       return rc;
> > > >  }
> > > >
> > > > +/*
> > > > + * Bug workaround - some boards expect the TPM to be on Locality 0
> > > > + * before suspend/shutdown, halting the system otherwise before
> > > > + * suspend and shutdown. Change suspend behavior for these cases.
> > > > + */
> > > > +bool tpm_locality_on_suspend;
> > > > +module_param_named(locality_on_suspend, tpm_locality_on_suspend, bool, 0644);
> > > > +MODULE_PARM_DESC(locality_on_suspend,
> > > > +              "Put TPM at locality 0 before suspend/shutdown.");
> > > > +
> > > > +static int __init tpm_set_locality_on_suspend(const struct dmi_system_id *system_id)
> > > > +{
> > > > +     pr_info("Board %s: TPM locality preserved before suspend/shutdown.\n",
> > > > +             system_id->ident);
> > >
> > > Please remove pr_info(), we do not want to bloat klog.
> > >
> > > > +     tpm_locality_on_suspend = true;
> > > > +
> > > > +     return 0;
> > > > +}
> > > > +
> > > > +static const struct dmi_system_id tpm_board_quirks[] __initconst = {
> > >
> > > The commit message did not introduce this. Also should have inline
> > > documentation.
> > >
> > > /*
> > >  * What the heck this.
> > >  */
> > >
> > > > +     {
> > > > +             .ident = "TUF GAMING B460M-PLUS",
> > > > +             .matches = {
> > > > +                     DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK COMPUTER INC."),
> > > > +                     DMI_MATCH(DMI_BOARD_NAME, "TUF GAMING B460M-PLUS"),
> > > > +             },
> > > > +             .callback = tpm_set_locality_on_suspend,
> > > > +     },
> > > > +};
> > > > +
> > > >  /*
> > > >   * We are about to suspend. Save the TPM state
> > > >   * so that it can be restored.
> > > > @@ -394,6 +425,9 @@ int tpm_pm_suspend(struct device *dev)
> > > >       if (!chip)
> > > >               return -ENODEV;
> > > >
> > > > +     if (tpm_locality_on_suspend)
> > > > +             chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY;
> > > > +
> > > >       if (chip->flags & TPM_CHIP_FLAG_ALWAYS_POWERED)
> > > >               goto suspended;
> > > >
> > > > @@ -431,6 +465,7 @@ int tpm_pm_resume(struct device *dev)
> > > >       if (chip == NULL)
> > > >               return -ENODEV;
> > > >
> > > > +     chip->flags &= ~TPM_CHIP_FLAG_PRESERVE_LOCALITY;
> > > >       chip->flags &= ~TPM_CHIP_FLAG_SUSPENDED;
> > > >
> > > >       /*
> > > > @@ -476,6 +511,8 @@ static int __init tpm_init(void)
> > > >  {
> > > >       int rc;
> > > >
> > > > +     dmi_check_system(tpm_board_quirks);
> > > > +
> > > >       rc = class_register(&tpm_class);
> > > >       if (rc) {
> > > >               pr_err("couldn't create tpm class\n");
> > > > diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> > > > index 61445f1dc46d..f2657b611b81 100644
> > > > --- a/drivers/char/tpm/tpm.h
> > > > +++ b/drivers/char/tpm/tpm.h
> > > > @@ -236,6 +236,7 @@ extern dev_t tpm_devt;
> > > >  extern const struct file_operations tpm_fops;
> > > >  extern const struct file_operations tpmrm_fops;
> > > >  extern struct idr dev_nums_idr;
> > > > +extern bool tpm_locality_on_suspend;
> > > >
> > > >  ssize_t tpm_transmit(struct tpm_chip *chip, u8 *buf, size_t bufsiz);
> > > >  int tpm_get_timeouts(struct tpm_chip *);
> > > > diff --git a/include/linux/tpm.h b/include/linux/tpm.h
> > > > index 4ee9d13749ad..1fbb33f386d1 100644
> > > > --- a/include/linux/tpm.h
> > > > +++ b/include/linux/tpm.h
> > > > @@ -284,6 +284,7 @@ enum tpm_chip_flags {
> > > >       TPM_CHIP_FLAG_FIRMWARE_UPGRADE          = BIT(7),
> > > >       TPM_CHIP_FLAG_SUSPENDED                 = BIT(8),
> > > >       TPM_CHIP_FLAG_HWRNG_DISABLED            = BIT(9),
> > > > +     TPM_CHIP_FLAG_PRESERVE_LOCALITY         = BIT(10),
> > > >  };
> > > >
> > > >  #define to_tpm_chip(d) container_of(d, struct tpm_chip, dev)
> > >
> > >
> > > BR, Jarkko
> >
> >
> >
> > --
> > Adam Oliveira Alves
>
>
>
> --
> Adam Oliveira Alves



-- 
Adam Oliveira Alves





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux