Hi Jarkko, I have an update here. I would like you to check if it makes sense before I submit a patch. The problem might be related to the chip itself which leaves the idle state whenever the locality is relinquished. I probed the chip while operating and noted that the TPM_CRB_CTRL_STS_0.tpmIdle bit (located at `regs_t->ctrl_sts` on `crb_priv` structure in tpm_crb.c) is always cleared whenever the locality is relinquished. The result is that this chip never becomes idle since after going idle the locality is relinquished (`tpm_chip_stop` function in tpm-chip.c). I patched the 7.8.0 kernel for bypassing the `tpm_relinquish_locality` call and so far the PC is suspending and power off normally, TPM_CRB_CTRL_STS_0.tpmIdle remains set whenever the device is not used now. I believe this is related to this specific chip, if the fix remains working for 2 weeks I will submit the patch. Please advise me if it makes sense. Best regards, Adam Em qua., 13 de mar. de 2024 às 14:02, Adam Alves <adamoa@xxxxxxxxx> escreveu: > > Hi Jarkko, > > Thank you very much for kindly reviewing this proposal. > > After one week without any issues with my PC hanging, it happened > again. It seems that the fix I am proposing is not final (it only > reduced the frequency since it always happened when I shutdown after > couple hours of power up time and now it only happened after two weeks > with a similar usage rate). > > I will share with you the data you requested below. > > > The lacking information here is the CPU model (/proc/cpuinfo), on > > which kernel version the bug was produced and what kind of TPM the > > system has (discrete chip or firmware TPM should be easy to check > > from BIOS). > > CPU model: Intel(R) Core(TM) i7-10700F CPU @ 2.90GHz > I am attaching data from /proc/cpuinfo > > TPM: No info on mainboard documentation regarding TPM. BIOS is not > clear whether or not it is discrete or firmware. Based on dmidecode > (attached) I get the following: > TPM Device > Vendor ID: INTC > Specification Version: 2.0 > Firmware Revision: 500.16 > Description: INTEL > Characteristics: > Family configurable via platform software support > OEM-specific Information: 0x00000000 > I also extracted TPM_CRB_INTF_ID_0 from the TPM: `a13a808600084311` > (Vendor ID 8086, Device ID a13a, Revision ID 00). The only match I > found while browsing for this device ID is 100 Series/C230 Series > Chipset Family MEI Controller #1, which is a PCI device, so it might > not be related to the TPM. > > The driver bound to the tpm0 device is tpm_crb. The disassembled TPM2 > ACPI table is also attached in case it helps. > > The bug was reproduced from upstream kernel version 6.8.0 (attached > build .config that I used). > > > Also, which firmwre version you have and have you tested with the > > most up to date firmware (BIOS)? > > I have the most updated firmware provided by ASUS: TUF GAMING > B460M-PLUS BIOS 1601 > > > What is "the ACPI command"? Refer to concrete items instead of > > asking to guess what you is the ACPI command for you. > > I enabled ACPI_DEBUG on my kernel to know where the it was actually > hanging. The last function is actually the last function that should > be called by the kernel for a successful shutdown: > hwsleep-0078 hw_legacy_sleep : Entering sleep state [S5]^M > hwregs-0460 hw_write_pm1_control : ----Entry^M > hwvalid-0097 hw_validate_io_request: ----Entry^M > hwvalid-0111 hw_validate_io_request: Address 0000000000001804 > LastAddress 0000000000001805 Length 2 hwvalid-0128 > hw_validate_io_request: ----Exit- AE_OK^M > hwregs-0360 hw_write : Wrote: 0000000000001C01 width 16 > to 0000000000001804 (SystemIO)^M > hwregs-0473 hw_write_pm1_control : ----Exit- AE_OK^M > hwregs-0460 hw_write_pm1_control : ----Entry^M > hwvalid-0097 hw_validate_io_request: ----Entry^M > hwvalid-0111 hw_validate_io_request: Address 0000000000001804 > LastAddress 0000000000001805 Length 2 hwvalid-0128 > hw_validate_io_request: ----Exit- AE_OK^M > > It is writing both SLP_TYP + SLP_EN to ACPI PM1b_CNT registers (as > expected by specification). I checked the flags and it is in line with > the flags required by the system ACPI tables. > > I understand from that that the system is hanging after ACPI firmware > takes over. The same issue happens if I force a EFI shutdown. > > Since the the BUG has appeared again even with the fix implemented, I > am holding this patch for now until I find a solution that permanently > fixes the issue. The next time I try to submit a patch that will > comply with all your suggestions, thank you very much. > > I would appreciate if you had any hint on how I could keep digging to > find the issue that might be causing this bug. This is an issue that > only happens when I shutdown from Linux and my TPM is activated in > BIOS. That's why my guess is that this is what should be causing it. > From Windows, shutdown is always flawless. > > Best regards, > Adam > > > > > > chip expecting it to be in Locality 0 as expected by TCG PC Client > > > Platform Firmware Profile Version 1.06 Revision 52 (3.1.1 – Pre-OS > > > Environment) and then when it fails to do so it simply halts the > > > whole system. > > > > We don't speculate about the root cause here, only document it. > > Please move this paragraph before diffstat (see below)> > > > > > Enable a user to configure the kernel through > > > “tpm.locality_on_suspend=1” boot parameter so that the locality is set > > > before suspend/shutdown in order to diagnose whether or not the board is > > > one of the buggy ones that require this workaround. Since this bug is > > > related to the board/platform instead of the specific TPM chip, call > > > dmi_check_system on the tpm_init function so that this setting is > > > automatically enabled for boards specified in code (ASUS TUF GAMING > > > B460M-PLUS already included) – automatic configuration only works in > > > case CONFIG_DMI is set though, since dmi_check_system is a non-op when > > > CONFIG_DMI is not set. > > > > Please describe what the *kernel command-line" (for clarity > > sake) semantically means. > > > > Also please remove anything about diangnosing. We care only > > about fixes. > > > > > > > > In case “tpm.locality_on_suspend=0” (the default) don't change any > > > behavior thus preserving current functionality of any other board > > > except ASUSTeK COMPUTER INC. TUF GAMING B460M-PLUS and possibly future > > > boards as we successfully diagnose other boards with the same issue > > > fixed by using “tpm.locality_on_suspend=1”. > > > > This neither documents the default value. I'm also lost did setting > > this "1" or "0" fix the issue in your case? > > > > So: firmware version and being up-to-date is important and also this > > needs to be reproduciable with the mainline Linux tree, not distro > > kernel or custom kernel. > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217890 > > > Signed-off-by: Adam Alves <adamoa@xxxxxxxxx> > > > --- > > > > <cover letter> > > > > OK, I'll try to check what is done here but please re-read > > "describing your changes" before sending next version: > > > > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes > > > > > v1->v2: fix formatting issues and simplified tpm_chip_stop code. > > > > > > drivers/char/tpm/tpm-chip.c | 12 +++++++++++ > > > drivers/char/tpm/tpm-interface.c | 37 ++++++++++++++++++++++++++++++++ > > > drivers/char/tpm/tpm.h | 1 + > > > include/linux/tpm.h | 1 + > > > 4 files changed, 51 insertions(+) > > > > > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c > > > index 42b1062e33cd..a183e1355289 100644 > > > --- a/drivers/char/tpm/tpm-chip.c > > > +++ b/drivers/char/tpm/tpm-chip.c > > > @@ -137,6 +137,12 @@ EXPORT_SYMBOL_GPL(tpm_chip_start); > > > */ > > > void tpm_chip_stop(struct tpm_chip *chip) > > > { > > > + if (chip->flags & TPM_CHIP_FLAG_PRESERVE_LOCALITY) { > > > > The commit message did not explain what this flag is and what is its > > purpose. > > > > Also why you need to populate global flag inside chip, or the value > > of it? > > > > Why this is not just: > > > > if (tpm_locality_on_suspend) { > > ? > > > > > > > + if (chip->locality != 0) > > > + tpm_request_locality(chip); > > > > This will unconditionally skip calling tpm_request_locality() because > > Linux only uses locality 0. Not sure what good does this make. > > > > > + return; > > > + } > > > + > > > tpm_go_idle(chip); > > > tpm_relinquish_locality(chip); > > > tpm_clk_disable(chip); > > > @@ -291,6 +297,9 @@ int tpm_class_shutdown(struct device *dev) > > > { > > > struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev); > > > > > > + if (tpm_locality_on_suspend) > > > + chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY; > > > + > > > down_write(&chip->ops_sem); > > > if (chip->flags & TPM_CHIP_FLAG_TPM2) { > > > if (!tpm_chip_start(chip)) { > > > @@ -668,6 +677,9 @@ EXPORT_SYMBOL_GPL(tpm_chip_register); > > > */ > > > void tpm_chip_unregister(struct tpm_chip *chip) > > > { > > > + if (tpm_locality_on_suspend) > > > + chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY; > > > + > > > tpm_del_legacy_sysfs(chip); > > > if (tpm_is_hwrng_enabled(chip)) > > > hwrng_unregister(&chip->hwrng); > > > diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c > > > index 66b16d26eecc..7f770ea98402 100644 > > > --- a/drivers/char/tpm/tpm-interface.c > > > +++ b/drivers/char/tpm/tpm-interface.c > > > @@ -26,6 +26,7 @@ > > > #include <linux/suspend.h> > > > #include <linux/freezer.h> > > > #include <linux/tpm_eventlog.h> > > > +#include <linux/dmi.h> > > > > > > #include "tpm.h" > > > > > > @@ -382,6 +383,36 @@ int tpm_auto_startup(struct tpm_chip *chip) > > > return rc; > > > } > > > > > > +/* > > > + * Bug workaround - some boards expect the TPM to be on Locality 0 > > > + * before suspend/shutdown, halting the system otherwise before > > > + * suspend and shutdown. Change suspend behavior for these cases. > > > + */ > > > +bool tpm_locality_on_suspend; > > > +module_param_named(locality_on_suspend, tpm_locality_on_suspend, bool, 0644); > > > +MODULE_PARM_DESC(locality_on_suspend, > > > + "Put TPM at locality 0 before suspend/shutdown."); > > > + > > > +static int __init tpm_set_locality_on_suspend(const struct dmi_system_id *system_id) > > > +{ > > > + pr_info("Board %s: TPM locality preserved before suspend/shutdown.\n", > > > + system_id->ident); > > > > Please remove pr_info(), we do not want to bloat klog. > > > > > + tpm_locality_on_suspend = true; > > > + > > > + return 0; > > > +} > > > + > > > +static const struct dmi_system_id tpm_board_quirks[] __initconst = { > > > > The commit message did not introduce this. Also should have inline > > documentation. > > > > /* > > * What the heck this. > > */ > > > > > + { > > > + .ident = "TUF GAMING B460M-PLUS", > > > + .matches = { > > > + DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK COMPUTER INC."), > > > + DMI_MATCH(DMI_BOARD_NAME, "TUF GAMING B460M-PLUS"), > > > + }, > > > + .callback = tpm_set_locality_on_suspend, > > > + }, > > > +}; > > > + > > > /* > > > * We are about to suspend. Save the TPM state > > > * so that it can be restored. > > > @@ -394,6 +425,9 @@ int tpm_pm_suspend(struct device *dev) > > > if (!chip) > > > return -ENODEV; > > > > > > + if (tpm_locality_on_suspend) > > > + chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY; > > > + > > > if (chip->flags & TPM_CHIP_FLAG_ALWAYS_POWERED) > > > goto suspended; > > > > > > @@ -431,6 +465,7 @@ int tpm_pm_resume(struct device *dev) > > > if (chip == NULL) > > > return -ENODEV; > > > > > > + chip->flags &= ~TPM_CHIP_FLAG_PRESERVE_LOCALITY; > > > chip->flags &= ~TPM_CHIP_FLAG_SUSPENDED; > > > > > > /* > > > @@ -476,6 +511,8 @@ static int __init tpm_init(void) > > > { > > > int rc; > > > > > > + dmi_check_system(tpm_board_quirks); > > > + > > > rc = class_register(&tpm_class); > > > if (rc) { > > > pr_err("couldn't create tpm class\n"); > > > diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h > > > index 61445f1dc46d..f2657b611b81 100644 > > > --- a/drivers/char/tpm/tpm.h > > > +++ b/drivers/char/tpm/tpm.h > > > @@ -236,6 +236,7 @@ extern dev_t tpm_devt; > > > extern const struct file_operations tpm_fops; > > > extern const struct file_operations tpmrm_fops; > > > extern struct idr dev_nums_idr; > > > +extern bool tpm_locality_on_suspend; > > > > > > ssize_t tpm_transmit(struct tpm_chip *chip, u8 *buf, size_t bufsiz); > > > int tpm_get_timeouts(struct tpm_chip *); > > > diff --git a/include/linux/tpm.h b/include/linux/tpm.h > > > index 4ee9d13749ad..1fbb33f386d1 100644 > > > --- a/include/linux/tpm.h > > > +++ b/include/linux/tpm.h > > > @@ -284,6 +284,7 @@ enum tpm_chip_flags { > > > TPM_CHIP_FLAG_FIRMWARE_UPGRADE = BIT(7), > > > TPM_CHIP_FLAG_SUSPENDED = BIT(8), > > > TPM_CHIP_FLAG_HWRNG_DISABLED = BIT(9), > > > + TPM_CHIP_FLAG_PRESERVE_LOCALITY = BIT(10), > > > }; > > > > > > #define to_tpm_chip(d) container_of(d, struct tpm_chip, dev) > > > > > > BR, Jarkko > > > > -- > Adam Oliveira Alves -- Adam Oliveira Alves