Re: amd_sfh driver causes kernel oops during boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jun 06 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> 
> On 06.06.23 04:36, Bagas Sanjaya wrote:
> > On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
> >> Hello,
> >>
> >> chiming in here as I'm experiencing what looks like the exact same issue, also 
> >> on a Lenovo Z13 notebook, also on Arch:
> >> Oops during startup in task udev-worker followed by udev-worker blocking all 
> >> attempts to suspend or cleanly shutdown/reboot the machine - in fact I first 
> >> noticed because the machine surprised with repeatedly running out of battery 
> >> after it had supposedly been in standby but couldn't. Only then I noticed the 
> >> error on boot.
> >>
> >> bisect result:
> >> 904e28c6de083fa4834cdbd0026470ddc30676fc is the first bad commit
> >> commit 904e28c6de083fa4834cdbd0026470ddc30676fc
> >> Merge: a738688177dc 2f7f4efb9411
> >> Author: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx>
> >> Date:   Wed Feb 22 10:44:31 2023 +0100
> >>
> >>     Merge branch 'for-6.3/hid-bpf' into for-linus
> > 
> > Hmm, seems like bad bisect (bisected to HID-BPF which IMO isn't related
> > to amd_sfh). Can you repeat the bisection?
> 
> Well, amd_sfh afaics apparently interacts with HID (see trace earlier in
> the thread), so it's not that far away. But it's a merge commit, which
> is possible, but doesn't happen every day. So a recheck might really be
> a good idea.

Let's not rule out that there is a bad interaction between HID-BPF and
AMD SFH. HID-BPF is able to process any incoming HID event, whether it
comes from AND SFH, USB, BT, I2C or anything else.

However, looking at the stack trace in the initial report[0], it seems
we are getting the oops/stack traces while we are still in amd_sfh:

list_add corruption. next is NULL.
WARNING: CPU: 5 PID: 433 at lib/list_debug.c:25 __list_add_valid+0x57/0xa0
...
RIP: 0010:__list_add_valid+0x57/0xa0
...
Call Trace:
  <TASK>
  amd_sfh_get_report+0xba/0x110 [amd_sfh 78bf82e66cdb2ccf24cbe871a0835ef4eedddb17]
...

If HID-BPF were involved, we should see a call to hid_input_report() IMO.
Also AMD SFH calls hid_input_report() in a workqueue, so I would expect
a different stack trace.

I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors are enabled,
clean up") because the stack trace says that there is a bad list_add,
which could happen if the object is not correctly initialized.

However, that commit was present in v6.2, so it might not be that one.

Back to the merge commit: the hid-bpf tree was merged in the hid tree
while it took its branch during the v6.1 cycle. So that might be the
reason you get this as a result of bisection because the AMD SFH code in
the hid-bpf branch is the one from the v6.1 kernel, and when you merge
it to the v6.2+ branch, you get a different code for that driver.

Cheers,
Benjamin

[0] https://lore.kernel.org/regressions/f40e3897-76f1-2cd0-2d83-e48d87130eab@xxxxxxxxxxxx/#t




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux