Re: Fwd: RCU indicates stalls with iwlwifi, causing boot failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/1/23 5:29 PM, Bagas Sanjaya wrote:
Hi,

I notice a bug report on Bugzilla [1]. Quoting from it:

Try booting with pcie=noaer ?

That fixes only known iwlwifi bug we have found in 6.5, but we are also using mostly
backports iwlwifi driver...

Thanks,
Ben


I'm seeing RCU warnings in Linus's current tree (like 87dfd85c38923acd9517e8df4afc908565df0961) that come from RCU:

WARNING: CPU: 0 PID: 0 at kernel/rcu/tree_exp.h:787 rcu_exp_handler+0x35/0xe0

But they *ONLY* occur on a system with a newer iwlwifi device:

aa:00.0 Network controller: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz (rev 1a)

and never in a VM or on an older device (like an 8260).  During a bisect the only seem to occur with the "83" version of the firmware.

iwlwifi 0000:aa:00.0: loaded firmware version 83.e8f84e98.0 ty-a0-gf-a0-83.ucode op_mode iwlmvm

The first warning gets spit out within a millisecond of the last printk() from the iwlwifi driver.  They eventually result in a big spew of RCU messages like this:

[   27.124796] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-...D } 125 jiffies s: 193 root: 0x1/.
[   27.126466] rcu: blocking rcu_node structures (internal RCU debug):
[   27.128114] Sending NMI from CPU 3 to CPUs 0:
[   27.128122] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x5f/0xb0
[   27.159757] loop30: detected capacity change from 0 to 8
[   27.204967] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-...D } 145 jiffies s: 193 root: 0x1/.
[   27.206353] rcu: blocking rcu_node structures (internal RCU debug):
[   27.207751] Sending NMI from CPU 3 to CPUs 0:
[   27.207825] NMI backtrace for cpu 0 skipped: idling at intel_idle+0x5f/0xb0

I usually see them at boot.  In that case, they usually hang the system and keep it from booting.  I've also encountered them at reboots and also seen them *not* be fatal at boot.  I suspect it has to do with which CPU gets wedged.

See Bugzilla for the full thread and attached full dmesg output.

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217856



--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux