Re: [Bug 219383] New: System reboot on S3 sleep/wakeup test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Looks like TPM. CCing the proper people.

On Mon, Oct 14, 2024 at 12:46:26AM +0000, bugzilla-daemon@xxxxxxxxxx wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219383
> 
>             Bug ID: 219383
>            Summary: System reboot on S3 sleep/wakeup test
>            Product: Platform Specific/Hardware
>            Version: 2.5
>           Hardware: All
>                 OS: Linux
>             Status: NEW
>           Severity: normal
>           Priority: P3
>          Component: x86-64
>           Assignee: platform_x86_64@xxxxxxxxxxxxxxxxxxxx
>           Reporter: mikeseohyungjin@xxxxxxxxx
>         Regression: No
> 
> I'm working for LG laptops, and I have run serveral LG PC with ubuntu OS. You
> may know, most LG laptops has intel soc.
> I found out a critical issue, system reboot on S3 sleep/wake up.
> 
> Enviornments:
> - PC BIOS : Phoenix Technologies
> - Intel Jasperlake or Intel Lunarlake 
> - OS Ubuntu 22.04(Jasperlake), 24.04.1(Lunarlake)
> - linux kernel version 6.x.0(Jasperlake) or up-to-date 6.11(Lunarlake)
> 
> Symptom:
> 
> Running the aging scripts like below, system reboots.
> -------------------------
> #!/bin/bash
> <snip>
> for (( i=1; i<=10000 ; i++ ))
> sudo rtcwake -m mem -s 10 >> ${LOG} 2>&1
> <snip>
> -------------------------
> The scripts works like below,
> 1. waits 10 secs
> 2. echo mem > /sys/power/state
> 3. waits 10 secs again and wake up system like press power button.
> 
> 
> My analysis:
> 
> I had reproduced several times to find that BIOS side triggered the system
> reboots.
> | pm_suspend() | syscore_suspend() | acpi_suspend_enter() | ... |  < BIOS > | 
> ...| acpi_suspend_enter() |  syscore_resume() | ...|
> 
> Debugging on BIOS, TPM2 can generate cold reset when it detects something wrong
> after TPM resuming.
> In the BIOS code, if there are active PCR banks that are not supported by the
> Platform mask, it supposes to be update the TPM allocations and reboot the
> machine.
> 
> It means that something in linux kernel side can effect operations of  tpm when
> going to sleep.
> So, I have debuggered and traced the functions related to tpm, such as
> tpm_chip_start whenever the symptoms represented.
> 
> In normal case, tpm_chip_start() called once like below,
>  tpm_pm_suspend()-> tpm_chip_start().
> but issued case, additionally called below
>  hwrng_fillfn ->
>   rng_get_data ->
>     tpm_hwrng_read ->
>       tpm_get_random ->
>         tpm_find_get_ops ->
>            tpm_try_get_ops ->
>              tpm_chip_start ->
> 
> I found out that when running hwrng_fillfn(), related to Hardware random number
> generator,  called during system_sleep, it can cause system reboots.
> To Verify it, I have tested with custom kernel which includes below patch.
> 
> -----------------------
> From 373e92bb6d471c5fb42bacb97a4caf5375df5522 Mon Sep 17 00:00:00 2001
> From: mike Seo <mikeseohyungjin@xxxxxxxxx>
> Date: Thu, 10 Oct 2024 14:04:57 +0900
> Subject: [PATCH] test_patch
> 
> test_patch for reboot while sleep/wakeup
> 
> Signed-off-by: mike Seo <mikeseohyungjin@xxxxxxxxx>
> ---
>  drivers/char/hw_random/core.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 57c51efa5..d3f0059a4 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -25,6 +25,7 @@
>  #include <linux/slab.h>
>  #include <linux/string.h>
>  #include <linux/uaccess.h>
> +#include <linux/suspend.h>
> 
>  #define RNG_MODULE_NAME                "hw_random"
> 
> @@ -469,6 +470,22 @@ static struct attribute *rng_dev_attrs[] = {
> 
>  ATTRIBUTE_GROUPS(rng_dev);
> 
> +
> +static int hwrng_pm_notification(struct notifier_block *nb, unsigned long
> action, void *data)
> +{
> +
> +       switch (action) {
> +       case PM_SUSPEND_PREPARE:
> +               is_suspend_prepare = 1;
> +               break;
> +       default:
> +               is_suspend_prepare = 0;
> +               break;
> +       }
> +       return 0;
> +}
> +
> +static struct notifier_block pm_notifier = { .notifier_call =
> hwrng_pm_notification };
>  static int hwrng_fillfn(void *unused)
>  {
>         size_t entropy, entropy_credit = 0; /* in 1/1024 of a bit */
> @@ -478,6 +495,9 @@ static int hwrng_fillfn(void *unused)
>                 unsigned short quality;
>                 struct hwrng *rng;
> 
> +               while (is_suspend_prepare)
> +                       msleep(500);
> +
>                 rng = get_current_rng();
>                 if (IS_ERR(rng) || !rng)
>                         break;
> @@ -549,6 +569,7 @@ int hwrng_register(struct hwrng *rng)
>                         goto out_unlock;
>         }
>         mutex_unlock(&rng_mutex);
> +       WARN_ON(register_pm_notifier(&pm_notifier));
>         return 0;
>  out_unlock:
>         mutex_unlock(&rng_mutex);
> -- 
> 2.43.0
> ------------------------
> 
> And I had passed over 10000 times of s3 wake/sleep aging test.
> 
> Can you make some patches for this issue and merges?
> 
> Thank you,
> Mike
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux Kernel]     [Linux Kernel Hardening]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux