On 1/16/23 12:44, Jarkko Sakkinen wrote: > On Fri, Jan 06, 2023 at 04:01:56AM +0100, Jason A. Donenfeld wrote: >> TPM 1 is sometimes broken across system suspends, due to races or >> locking issues or something else that haven't been diagnosed or fixed >> yet, most likely having to do with concurrent reads from the TPM's >> hardware random number generator driver. These issues prevent the system >> from actually suspending, with errors like: >> >> tpm tpm0: A TPM error (28) occurred continue selftest >> ... > > <REMOVE> > >> tpm tpm0: A TPM error (28) occurred attempting get random >> ... >> tpm tpm0: Error (28) sending savestate before suspend >> tpm_tis 00:08: PM: __pnp_bus_suspend(): tpm_pm_suspend+0x0/0x80 returns 28 >> tpm_tis 00:08: PM: dpm_run_callback(): pnp_bus_suspend+0x0/0x10 returns 28 >> tpm_tis 00:08: PM: failed to suspend: error 28 >> PM: Some devices failed to suspend, or early wake event detected > > </REMOVE> > > Unrelated to thix particular fix. Not sure I understand. AFAIK this is not a proper fix, but a workaround for when laptop suspend no longer works because TPM fails to suspend. The error messages quoted above are very much related to the problem of suspend not working, and this patch did work as advertised at least for me. I see errors but they don't prevent suspend anymore: https://lore.kernel.org/all/58d7a42c-9e6b-ab2a-617f-d5e373bf63cb@xxxxxxx/ >> This issue was partially fixed by 23393c646142 ("char: tpm: Protect >> tpm_pm_suspend with locks"), in a last minute 6.1 commit that Linus took >> directly because the TPM maintainers weren't available. However, it >> seems like this just addresses the most common cases of the bug, rather >> than addressing it entirely. So there are more things to fix still, >> apparently. >> >> In lieu of actually fixing the underlying bug, just allow system suspend >> to continue, so that laptops still go to sleep fine. Later, this can be >> reverted when the real bug is fixed. >> >> Link: https://lore.kernel.org/lkml/7cbe96cf-e0b5-ba63-d1b4-f63d2e826efa@xxxxxxx/ >> Cc: stable@xxxxxxxxxxxxxxx # 6.1+ >> Reported-by: Vlastimil Babka <vbabka@xxxxxxx> >> Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> >> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx> >> --- >> This is basically untested and I haven't worked out if there are any >> awful implications of letting the system sleep when TPM suspend fails. >> Maybe some PCRs get cleared and that will make everything explode on >> resume? Maybe it doesn't matter? Somebody well versed in TPMology should >> probably [n]ack this approach. >> >> drivers/char/tpm/tpm-interface.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c >> index d69905233aff..6df9067ef7f9 100644 >> --- a/drivers/char/tpm/tpm-interface.c >> +++ b/drivers/char/tpm/tpm-interface.c >> @@ -412,7 +412,10 @@ int tpm_pm_suspend(struct device *dev) >> } >> >> suspended: >> - return rc; >> + if (rc) >> + pr_err("Unable to suspend tpm-%d (error %d), but continuing system suspend\n", >> + chip->dev_num, rc); >> + return 0; >> } >> EXPORT_SYMBOL_GPL(tpm_pm_suspend); >> >> -- >> 2.39.0 >> > > This tpm_tis local issue, nothing to do with tpm_pm_suspend(). Executing > the selftest as part of wake up, is TPM 1.2 dTPM specific requirement, and > the call is located in tpm_tis_resume() [*]. > > [*] https://lore.kernel.org/lkml/Y8U1QxA4GYvPWDky@xxxxxxxxxx/ Yes the changelog at the top does say "due to races or locking issues or something else that haven't been diagnosed or fixed yet" I don't know what causes the TPM to start returning error 28 on resume and never recover from it. But it didn't happen before hwrng started using the TPM. Before that, it was probably just the selftest ever doing anything with the TPM, and on its own I don't recall it ever (before 6.1) failing and preventing further suspend/resume. > BR, Jarkko