Hi, this is your Linux kernel regression tracker. On 28.11.22 20:56, Jason A. Donenfeld wrote: BTW, many thx for taking care of this Jason! > From: Jan Dabros <jsd@xxxxxxxxxxxx> > > Currently tpm transactions are executed unconditionally in > tpm_pm_suspend() function, which may lead to races with other tpm > accessors in the system. Specifically, the hw_random tpm driver makes > use of tpm_get_random(), and this function is called in a loop from a > kthread, which means it's not frozen alongside userspace, and so can > race with the work done during system suspend: Peter, Jarkko, did you look at this patch or even applied it already to send it to Linus soon? Doesn't look like it from here, but maybe I missed something. Thing is: the linked regression afaics is overdue fixing (for details see "Prioritize work on fixing regressions" in https://www.kernel.org/doc/html/latest/process/handling-regressions.html ). Hence if this doesn't make any progress I'll likely have to point Linus to this patch and suggest to apply it directly if it looks okay from his perspective. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. > [ 3.277834] tpm tpm0: tpm_transmit: tpm_recv: error -52 > [ 3.278437] tpm tpm0: invalid TPM_STS.x 0xff, dumping stack for forensics > [ 3.278445] CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc5+ #135 > [ 3.278450] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 > [ 3.278453] Call Trace: > [ 3.278458] <TASK> > [ 3.278460] dump_stack_lvl+0x34/0x44 > [ 3.278471] tpm_tis_status.cold+0x19/0x20 > [ 3.278479] tpm_transmit+0x13b/0x390 > [ 3.278489] tpm_transmit_cmd+0x20/0x80 > [ 3.278496] tpm1_pm_suspend+0xa6/0x110 > [ 3.278503] tpm_pm_suspend+0x53/0x80 > [ 3.278510] __pnp_bus_suspend+0x35/0xe0 > [ 3.278515] ? pnp_bus_freeze+0x10/0x10 > [ 3.278519] __device_suspend+0x10f/0x350 > > Fix this by calling tpm_try_get_ops(), which itself is a wrapper around > tpm_chip_start(), but takes the appropriate mutex. > > Signed-off-by: Jan Dabros <jsd@xxxxxxxxxxxx> > Reported-by: Vlastimil Babka <vbabka@xxxxxxx> > Tested-by: Jason A. Donenfeld <Jason@xxxxxxxxx> > Tested-by: Vlastimil Babka <vbabka@xxxxxxx> > Link: https://lore.kernel.org/all/c5ba47ef-393f-1fba-30bd-1230d1b4b592@xxxxxxx/ > Cc: stable@xxxxxxxxxxxxxxx > Fixes: e891db1a18bf ("tpm: turn on TPM on suspend for TPM 1.x") > [Jason: reworked commit message, added metadata] > Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx> > --- > drivers/char/tpm/tpm-interface.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c > index 1621ce818705..d69905233aff 100644 > --- a/drivers/char/tpm/tpm-interface.c > +++ b/drivers/char/tpm/tpm-interface.c > @@ -401,13 +401,14 @@ int tpm_pm_suspend(struct device *dev) > !pm_suspend_via_firmware()) > goto suspended; > > - if (!tpm_chip_start(chip)) { > + rc = tpm_try_get_ops(chip); > + if (!rc) { > if (chip->flags & TPM_CHIP_FLAG_TPM2) > tpm2_shutdown(chip, TPM2_SU_STATE); > else > rc = tpm1_pm_suspend(chip, tpm_suspend_pcr); > > - tpm_chip_stop(chip); > + tpm_put_ops(chip); > } > > suspended: