Fixing a typo: I patched the 6.8.0 kernel. Em qui., 14 de mar. de 2024 às 13:31, Adam Alves <adamoa@xxxxxxxxx> escreveu: > > Hi Jarkko, > > I have an update here. I would like you to check if it makes sense > before I submit a patch. > > The problem might be related to the chip itself which leaves the idle > state whenever the locality is relinquished. > > I probed the chip while operating and noted that the > TPM_CRB_CTRL_STS_0.tpmIdle bit (located at `regs_t->ctrl_sts` on > `crb_priv` structure in tpm_crb.c) is always cleared whenever the > locality is relinquished. > > The result is that this chip never becomes idle since after going idle > the locality is relinquished (`tpm_chip_stop` function in tpm-chip.c). > > I patched the 7.8.0 kernel for bypassing the `tpm_relinquish_locality` > call and so far the PC is suspending and power off normally, > TPM_CRB_CTRL_STS_0.tpmIdle remains set whenever the device is not used > now. > > I believe this is related to this specific chip, if the fix remains > working for 2 weeks I will submit the patch. Please advise me if it > makes sense. > > Best regards, > Adam > > Em qua., 13 de mar. de 2024 às 14:02, Adam Alves <adamoa@xxxxxxxxx> escreveu: > > > > Hi Jarkko, > > > > Thank you very much for kindly reviewing this proposal. > > > > After one week without any issues with my PC hanging, it happened > > again. It seems that the fix I am proposing is not final (it only > > reduced the frequency since it always happened when I shutdown after > > couple hours of power up time and now it only happened after two weeks > > with a similar usage rate). > > > > I will share with you the data you requested below. > > > > > The lacking information here is the CPU model (/proc/cpuinfo), on > > > which kernel version the bug was produced and what kind of TPM the > > > system has (discrete chip or firmware TPM should be easy to check > > > from BIOS). > > > > CPU model: Intel(R) Core(TM) i7-10700F CPU @ 2.90GHz > > I am attaching data from /proc/cpuinfo > > > > TPM: No info on mainboard documentation regarding TPM. BIOS is not > > clear whether or not it is discrete or firmware. Based on dmidecode > > (attached) I get the following: > > TPM Device > > Vendor ID: INTC > > Specification Version: 2.0 > > Firmware Revision: 500.16 > > Description: INTEL > > Characteristics: > > Family configurable via platform software support > > OEM-specific Information: 0x00000000 > > I also extracted TPM_CRB_INTF_ID_0 from the TPM: `a13a808600084311` > > (Vendor ID 8086, Device ID a13a, Revision ID 00). The only match I > > found while browsing for this device ID is 100 Series/C230 Series > > Chipset Family MEI Controller #1, which is a PCI device, so it might > > not be related to the TPM. > > > > The driver bound to the tpm0 device is tpm_crb. The disassembled TPM2 > > ACPI table is also attached in case it helps. > > > > The bug was reproduced from upstream kernel version 6.8.0 (attached > > build .config that I used). > > > > > Also, which firmwre version you have and have you tested with the > > > most up to date firmware (BIOS)? > > > > I have the most updated firmware provided by ASUS: TUF GAMING > > B460M-PLUS BIOS 1601 > > > > > What is "the ACPI command"? Refer to concrete items instead of > > > asking to guess what you is the ACPI command for you. > > > > I enabled ACPI_DEBUG on my kernel to know where the it was actually > > hanging. The last function is actually the last function that should > > be called by the kernel for a successful shutdown: > > hwsleep-0078 hw_legacy_sleep : Entering sleep state [S5]^M > > hwregs-0460 hw_write_pm1_control : ----Entry^M > > hwvalid-0097 hw_validate_io_request: ----Entry^M > > hwvalid-0111 hw_validate_io_request: Address 0000000000001804 > > LastAddress 0000000000001805 Length 2 hwvalid-0128 > > hw_validate_io_request: ----Exit- AE_OK^M > > hwregs-0360 hw_write : Wrote: 0000000000001C01 width 16 > > to 0000000000001804 (SystemIO)^M > > hwregs-0473 hw_write_pm1_control : ----Exit- AE_OK^M > > hwregs-0460 hw_write_pm1_control : ----Entry^M > > hwvalid-0097 hw_validate_io_request: ----Entry^M > > hwvalid-0111 hw_validate_io_request: Address 0000000000001804 > > LastAddress 0000000000001805 Length 2 hwvalid-0128 > > hw_validate_io_request: ----Exit- AE_OK^M > > > > It is writing both SLP_TYP + SLP_EN to ACPI PM1b_CNT registers (as > > expected by specification). I checked the flags and it is in line with > > the flags required by the system ACPI tables. > > > > I understand from that that the system is hanging after ACPI firmware > > takes over. The same issue happens if I force a EFI shutdown. > > > > Since the the BUG has appeared again even with the fix implemented, I > > am holding this patch for now until I find a solution that permanently > > fixes the issue. The next time I try to submit a patch that will > > comply with all your suggestions, thank you very much. > > > > I would appreciate if you had any hint on how I could keep digging to > > find the issue that might be causing this bug. This is an issue that > > only happens when I shutdown from Linux and my TPM is activated in > > BIOS. That's why my guess is that this is what should be causing it. > > From Windows, shutdown is always flawless. > > > > Best regards, > > Adam > > > > > > > > > chip expecting it to be in Locality 0 as expected by TCG PC Client > > > > Platform Firmware Profile Version 1.06 Revision 52 (3.1.1 – Pre-OS > > > > Environment) and then when it fails to do so it simply halts the > > > > whole system. > > > > > > We don't speculate about the root cause here, only document it. > > > Please move this paragraph before diffstat (see below)> > > > > > > > Enable a user to configure the kernel through > > > > “tpm.locality_on_suspend=1” boot parameter so that the locality is set > > > > before suspend/shutdown in order to diagnose whether or not the board is > > > > one of the buggy ones that require this workaround. Since this bug is > > > > related to the board/platform instead of the specific TPM chip, call > > > > dmi_check_system on the tpm_init function so that this setting is > > > > automatically enabled for boards specified in code (ASUS TUF GAMING > > > > B460M-PLUS already included) – automatic configuration only works in > > > > case CONFIG_DMI is set though, since dmi_check_system is a non-op when > > > > CONFIG_DMI is not set. > > > > > > Please describe what the *kernel command-line" (for clarity > > > sake) semantically means. > > > > > > Also please remove anything about diangnosing. We care only > > > about fixes. > > > > > > > > > > > In case “tpm.locality_on_suspend=0” (the default) don't change any > > > > behavior thus preserving current functionality of any other board > > > > except ASUSTeK COMPUTER INC. TUF GAMING B460M-PLUS and possibly future > > > > boards as we successfully diagnose other boards with the same issue > > > > fixed by using “tpm.locality_on_suspend=1”. > > > > > > This neither documents the default value. I'm also lost did setting > > > this "1" or "0" fix the issue in your case? > > > > > > So: firmware version and being up-to-date is important and also this > > > needs to be reproduciable with the mainline Linux tree, not distro > > > kernel or custom kernel. > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217890 > > > > Signed-off-by: Adam Alves <adamoa@xxxxxxxxx> > > > > --- > > > > > > <cover letter> > > > > > > OK, I'll try to check what is done here but please re-read > > > "describing your changes" before sending next version: > > > > > > https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes > > > > > > > v1->v2: fix formatting issues and simplified tpm_chip_stop code. > > > > > > > > drivers/char/tpm/tpm-chip.c | 12 +++++++++++ > > > > drivers/char/tpm/tpm-interface.c | 37 ++++++++++++++++++++++++++++++++ > > > > drivers/char/tpm/tpm.h | 1 + > > > > include/linux/tpm.h | 1 + > > > > 4 files changed, 51 insertions(+) > > > > > > > > diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c > > > > index 42b1062e33cd..a183e1355289 100644 > > > > --- a/drivers/char/tpm/tpm-chip.c > > > > +++ b/drivers/char/tpm/tpm-chip.c > > > > @@ -137,6 +137,12 @@ EXPORT_SYMBOL_GPL(tpm_chip_start); > > > > */ > > > > void tpm_chip_stop(struct tpm_chip *chip) > > > > { > > > > + if (chip->flags & TPM_CHIP_FLAG_PRESERVE_LOCALITY) { > > > > > > The commit message did not explain what this flag is and what is its > > > purpose. > > > > > > Also why you need to populate global flag inside chip, or the value > > > of it? > > > > > > Why this is not just: > > > > > > if (tpm_locality_on_suspend) { > > > ? > > > > > > > > > > + if (chip->locality != 0) > > > > + tpm_request_locality(chip); > > > > > > This will unconditionally skip calling tpm_request_locality() because > > > Linux only uses locality 0. Not sure what good does this make. > > > > > > > + return; > > > > + } > > > > + > > > > tpm_go_idle(chip); > > > > tpm_relinquish_locality(chip); > > > > tpm_clk_disable(chip); > > > > @@ -291,6 +297,9 @@ int tpm_class_shutdown(struct device *dev) > > > > { > > > > struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev); > > > > > > > > + if (tpm_locality_on_suspend) > > > > + chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY; > > > > + > > > > down_write(&chip->ops_sem); > > > > if (chip->flags & TPM_CHIP_FLAG_TPM2) { > > > > if (!tpm_chip_start(chip)) { > > > > @@ -668,6 +677,9 @@ EXPORT_SYMBOL_GPL(tpm_chip_register); > > > > */ > > > > void tpm_chip_unregister(struct tpm_chip *chip) > > > > { > > > > + if (tpm_locality_on_suspend) > > > > + chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY; > > > > + > > > > tpm_del_legacy_sysfs(chip); > > > > if (tpm_is_hwrng_enabled(chip)) > > > > hwrng_unregister(&chip->hwrng); > > > > diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c > > > > index 66b16d26eecc..7f770ea98402 100644 > > > > --- a/drivers/char/tpm/tpm-interface.c > > > > +++ b/drivers/char/tpm/tpm-interface.c > > > > @@ -26,6 +26,7 @@ > > > > #include <linux/suspend.h> > > > > #include <linux/freezer.h> > > > > #include <linux/tpm_eventlog.h> > > > > +#include <linux/dmi.h> > > > > > > > > #include "tpm.h" > > > > > > > > @@ -382,6 +383,36 @@ int tpm_auto_startup(struct tpm_chip *chip) > > > > return rc; > > > > } > > > > > > > > +/* > > > > + * Bug workaround - some boards expect the TPM to be on Locality 0 > > > > + * before suspend/shutdown, halting the system otherwise before > > > > + * suspend and shutdown. Change suspend behavior for these cases. > > > > + */ > > > > +bool tpm_locality_on_suspend; > > > > +module_param_named(locality_on_suspend, tpm_locality_on_suspend, bool, 0644); > > > > +MODULE_PARM_DESC(locality_on_suspend, > > > > + "Put TPM at locality 0 before suspend/shutdown."); > > > > + > > > > +static int __init tpm_set_locality_on_suspend(const struct dmi_system_id *system_id) > > > > +{ > > > > + pr_info("Board %s: TPM locality preserved before suspend/shutdown.\n", > > > > + system_id->ident); > > > > > > Please remove pr_info(), we do not want to bloat klog. > > > > > > > + tpm_locality_on_suspend = true; > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +static const struct dmi_system_id tpm_board_quirks[] __initconst = { > > > > > > The commit message did not introduce this. Also should have inline > > > documentation. > > > > > > /* > > > * What the heck this. > > > */ > > > > > > > + { > > > > + .ident = "TUF GAMING B460M-PLUS", > > > > + .matches = { > > > > + DMI_MATCH(DMI_BOARD_VENDOR, "ASUSTeK COMPUTER INC."), > > > > + DMI_MATCH(DMI_BOARD_NAME, "TUF GAMING B460M-PLUS"), > > > > + }, > > > > + .callback = tpm_set_locality_on_suspend, > > > > + }, > > > > +}; > > > > + > > > > /* > > > > * We are about to suspend. Save the TPM state > > > > * so that it can be restored. > > > > @@ -394,6 +425,9 @@ int tpm_pm_suspend(struct device *dev) > > > > if (!chip) > > > > return -ENODEV; > > > > > > > > + if (tpm_locality_on_suspend) > > > > + chip->flags |= TPM_CHIP_FLAG_PRESERVE_LOCALITY; > > > > + > > > > if (chip->flags & TPM_CHIP_FLAG_ALWAYS_POWERED) > > > > goto suspended; > > > > > > > > @@ -431,6 +465,7 @@ int tpm_pm_resume(struct device *dev) > > > > if (chip == NULL) > > > > return -ENODEV; > > > > > > > > + chip->flags &= ~TPM_CHIP_FLAG_PRESERVE_LOCALITY; > > > > chip->flags &= ~TPM_CHIP_FLAG_SUSPENDED; > > > > > > > > /* > > > > @@ -476,6 +511,8 @@ static int __init tpm_init(void) > > > > { > > > > int rc; > > > > > > > > + dmi_check_system(tpm_board_quirks); > > > > + > > > > rc = class_register(&tpm_class); > > > > if (rc) { > > > > pr_err("couldn't create tpm class\n"); > > > > diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h > > > > index 61445f1dc46d..f2657b611b81 100644 > > > > --- a/drivers/char/tpm/tpm.h > > > > +++ b/drivers/char/tpm/tpm.h > > > > @@ -236,6 +236,7 @@ extern dev_t tpm_devt; > > > > extern const struct file_operations tpm_fops; > > > > extern const struct file_operations tpmrm_fops; > > > > extern struct idr dev_nums_idr; > > > > +extern bool tpm_locality_on_suspend; > > > > > > > > ssize_t tpm_transmit(struct tpm_chip *chip, u8 *buf, size_t bufsiz); > > > > int tpm_get_timeouts(struct tpm_chip *); > > > > diff --git a/include/linux/tpm.h b/include/linux/tpm.h > > > > index 4ee9d13749ad..1fbb33f386d1 100644 > > > > --- a/include/linux/tpm.h > > > > +++ b/include/linux/tpm.h > > > > @@ -284,6 +284,7 @@ enum tpm_chip_flags { > > > > TPM_CHIP_FLAG_FIRMWARE_UPGRADE = BIT(7), > > > > TPM_CHIP_FLAG_SUSPENDED = BIT(8), > > > > TPM_CHIP_FLAG_HWRNG_DISABLED = BIT(9), > > > > + TPM_CHIP_FLAG_PRESERVE_LOCALITY = BIT(10), > > > > }; > > > > > > > > #define to_tpm_chip(d) container_of(d, struct tpm_chip, dev) > > > > > > > > > BR, Jarkko > > > > > > > > -- > > Adam Oliveira Alves > > > > -- > Adam Oliveira Alves -- Adam Oliveira Alves