On Fri Aug 18, 2023 at 8:57 PM EEST, Mario Limonciello wrote: > On 8/18/2023 12:53, Jarkko Sakkinen wrote: > > On Fri Aug 18, 2023 at 8:21 PM EEST, Mario Limonciello wrote: > >> On 8/18/2023 12:00, Jarkko Sakkinen wrote: > >>> On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario wrote: > >>>> > >>>> > >>>> On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote: > >>>>> On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote: > >>>>>> On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote: > >>>>>>> On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote: > >>>>>>>> While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing > >>>>>>>> a > >>>>>>>> crash and reboot situation when S3 suspend is initiated. To > >>>>>>>> reproduce > >>>>>>>> it, this call is all that's required "sudo sleepgraph -m mem > >>>>>>>> -rtcwake > >>>>>>>> 15". > >>>>>>> > >>>>>>> 1. Are there logs available? > >>>>>>> 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never > >>>>>>> used it before). > >>>>>> > >>>>>> There are no dmesg logs because the S3 crash wipes them out. Sleepgraph > >>>>>> isn't actually necessary to activate it, just an S3 suspend "echo mem > > >>>>>> /sys/power/state". > >>>>>> > >>>>>> So far it appears to only have affected test systems, not production > >>>>>> hardware, and none of them have TPM chips, so I'm beginning to wonder > >>>>>> if this patch just inadvertently activated a bug somewhere else in the > >>>>>> kernel that happens to affect test hardware. > >>>>>> > >>>>>> I'll continue to debug it, this isn't an emergency as so far I haven't > >>>>>> seen it in production hardware. > >>>>> > >>>>> OK, I'll still see if I could reproduce it just in case. > >>>>> > >>>>> BR, Jarkko > >>>> > >>>> I'd like to better understand what kind of TPM initialization path has > >>>> run. Does the machine have some sort of TPM that failed to fully > >>>> initialize perhaps? > >>>> > >>>> If you can't share a full bootup dmesg, can you at least share > >>>> > >>>> # dmesg | grep -i tpm > >>> > >>> It would be more useful perhaps to get full dmesg output after power on > >>> and before going into suspend. > >>> > >>> Also ftrace filter could be added to the kernel command-line: > >>> > >>> ftrace=function ftrace_filter=tpm* > >>> > >>> After bootup: > >>> > >>> mount -t tracefs nodev /sys/kernel/tracing > >>> cat /sys/kernel/tracing/trace > >>> > >>> BR, Jarkko > >> > >> Todd and I have gone back and forth a little bit on the bugzilla > >> (https://bugzilla.kernel.org/show_bug.cgi?id=217804), and it seems that > >> this isn't an S3 problem - it's a probing problem. > >> > >> [ 1.132521] tpm_crb: probe of INTC6001:00 failed with error 378 > >> > >> That error 378 specifically matches TPM2_CC_GET_CAPABILITY, which is the > >> same command that was being requested. This leads me to believe the TPM > >> isn't ready at the time of probing. > >> > >> In this case one solution is we could potentially ignore failures for > >> that tpm2_get_tpm_pt() call, but I think we should first understand why > >> it doesn't work at probing time for this TPM to ensure the actual quirk > >> isn't built on a house of cards. > > > > Given that there is nothing known broken (at the moment) in production, > > I think the following might be a reasonable change. > > > > BR, Jarkko > > > > Yeah that would prevent it. > > Here's a simpler change that I think should work too though: > diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c > index 9eb1a18590123..b0e9931fe436c 100644 > --- a/drivers/char/tpm/tpm_crb.c > +++ b/drivers/char/tpm/tpm_crb.c > @@ -472,8 +472,7 @@ static int crb_check_flags(struct tpm_chip *chip) > if (ret) > return ret; > > - ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL); > - if (ret) > + if (tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL)) > goto release; > > if (val == 0x414D4400U /* AMD */) > > I think Todd needs to check whether TPM works with that or not though. Hmm... I'm sorry if I have a blind spot now but what is that changing? BR, Jarkko