Re: amd_sfh driver causes kernel oops during boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/6/2023 3:08 AM, Benjamin Tissoires wrote:
On Jun 06 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
On 06.06.23 04:36, Bagas Sanjaya wrote:
On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
Hello,

chiming in here as I'm experiencing what looks like the exact same issue, also
on a Lenovo Z13 notebook, also on Arch:
Oops during startup in task udev-worker followed by udev-worker blocking all
attempts to suspend or cleanly shutdown/reboot the machine - in fact I first
noticed because the machine surprised with repeatedly running out of battery
after it had supposedly been in standby but couldn't. Only then I noticed the
error on boot.

bisect result:
904e28c6de083fa4834cdbd0026470ddc30676fc is the first bad commit
commit 904e28c6de083fa4834cdbd0026470ddc30676fc
Merge: a738688177dc 2f7f4efb9411
Author: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx>
Date:   Wed Feb 22 10:44:31 2023 +0100

     Merge branch 'for-6.3/hid-bpf' into for-linus
Hmm, seems like bad bisect (bisected to HID-BPF which IMO isn't related
to amd_sfh). Can you repeat the bisection?
Well, amd_sfh afaics apparently interacts with HID (see trace earlier in
the thread), so it's not that far away. But it's a merge commit, which
is possible, but doesn't happen every day. So a recheck might really be
a good idea.
Let's not rule out that there is a bad interaction between HID-BPF and
AMD SFH. HID-BPF is able to process any incoming HID event, whether it
comes from AND SFH, USB, BT, I2C or anything else.

However, looking at the stack trace in the initial report[0], it seems
we are getting the oops/stack traces while we are still in amd_sfh:

list_add corruption. next is NULL.
WARNING: CPU: 5 PID: 433 at lib/list_debug.c:25 __list_add_valid+0x57/0xa0
...
RIP: 0010:__list_add_valid+0x57/0xa0
...
Call Trace:
   <TASK>
   amd_sfh_get_report+0xba/0x110 [amd_sfh 78bf82e66cdb2ccf24cbe871a0835ef4eedddb17]
...

If HID-BPF were involved, we should see a call to hid_input_report() IMO.
Also AMD SFH calls hid_input_report() in a workqueue, so I would expect
a different stack trace.

I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors are enabled,
clean up") because the stack trace says that there is a bad list_add,
which could happen if the object is not correctly initialized.

However, that commit was present in v6.2, so it might not be that one.

Back to the merge commit: the hid-bpf tree was merged in the hid tree
while it took its branch during the v6.1 cycle. So that might be the
reason you get this as a result of bisection because the AMD SFH code in
the hid-bpf branch is the one from the v6.1 kernel, and when you merge
it to the v6.2+ branch, you get a different code for that driver.

Cheers,
Benjamin

[0] https://lore.kernel.org/regressions/f40e3897-76f1-2cd0-2d83-e48d87130eab@xxxxxxxxxxxx/#t
If I'm not mistaken the Z13 doesn't actually have any
sensors connected to SFH.  So I think the suspicion on
7bcfdab3f0c6 and theory this is triggered by HID init makes
a lot of sense.

Can you try this patch?

diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
index d9b7b01900b5..fa693a5224c6 100644
--- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
+++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
@@ -324,6 +324,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata)
                        devm_kfree(dev, cl_data->report_descr[i]);
                }
                dev_warn(dev, "Failed to discover, sensors not enabled is %d\n", cl_data->is_any_sensor_enabled);
+               cl_data->num_hid_devices = 0;
                return -EOPNOTSUPP;
        }
        schedule_delayed_work(&cl_data->work_buffer, msecs_to_jiffies(AMD_SFH_IDLE_LOOP));




[Index of Archives]     [Linux Media Devel]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Linux Wireless Networking]     [Linux Omap]

  Powered by Linux