Re: [Regression] wifi problems since tg3 started throwing rcu stall warnings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Frederic and Thorston,

(CC-ing the laptop's owner so that she might help with further testing...)

在 2024-10-23 18:22,Linux regression tracking (Thorsten Leemhuis) 写道:
On 23.10.24 12:09, Frederic Weisbecker wrote:
Le Wed, Oct 23, 2024 at 10:27:18AM +0200, Linux regression tracking (Thorsten Leemhuis) a écrit :
Hi, Thorsten here, the Linux kernel's regression tracker.

Frederic, I noticed a report about a regression in bugzilla.kernel.org
that appears to be caused by the following change of yours:

55d4669ef1b768 ("rcu: Fix rcu_barrier() VS post CPUHP_TEARDOWN_CPU
invocation")

Are you sure about the commit? Below it says:

Not totally, but...

As many (most?) kernel developers don't keep an eye on the bug tracker,
I decided to write this mail. To quote from
https://bugzilla.kernel.org/show_bug.cgi?id=219390:

 Mingcong Bai 2024-10-15 13:32:35 UTC

Since aa162aa4aa383a0a714b1c36e8fcc77612ddd1a2 between v6.10.4 and

Now that's aa162aa4aa383a0a714b1c36e8fcc77612ddd1a2 which I can't find in vanilla
tree.

...quite, as that is the commit-id of the backport to v6.10.5; and the
reporter reverted it there. Ideally of course that would have happened
on recent mainline. If you doubt, ask Mingcong Bai to check if a revert
there helps, too.

Do we need any further information/testing on this issue? Please let me know if there's anything we can do as the issue still persists in 6.12.

Best Regards,
Mingcong Bai


HTH, Ciao, Thorsten

Also I'm failing to see an immediate issue between the below stacktrace
and the above commit. So are we sure about that reference?

Thanks.


v6.10.5, the Broadcom Tigon3 Ethernet interface (tg3) found on Apple
MacBook Pro (15'', Mid 2010) would throw many rcu stall errors during
boot up, causing peripherals such as the wireless card to misbehave:

[ 24.153855] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 2-.... } 21 jiffies s: 973 root: 0x4/. [ 24.166938] rcu: blocking rcu_node structures (internal RCU debug):
[   24.177800] Sending NMI from CPU 3 to CPUs 2:
[   24.183113] NMI backtrace for cpu 2
[ 24.183119] CPU: 2 PID: 1049 Comm: NetworkManager Not tainted 6.10.5-aosc-main #1 [ 24.183123] Hardware name: Apple Inc. MacBookPro6,2/Mac-F22586C8, BIOS MBP61.88Z.005D.B00.1804100943 04/10/18
[   24.183125] RIP: 0010:__this_module+0x2d3d1/0x4f310 [tg3]
[ 24.183135] Code: c3 cc cc cc cc 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 89 f6 48 03 77 30 8b 06 <31> f6 31 ff c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90
[   24.183138] RSP: 0018:ffffbf1a011d75e8 EFLAGS: 00000082
[ 24.183141] RAX: 0000000000000000 RBX: ffffa04ec78f8a00 RCX: 0000000000000000 [ 24.183143] RDX: 0000000000000000 RSI: ffffbf1a00fb007c RDI: ffffa04ec78f8a00 [ 24.183145] RBP: 0000000000000b50 R08: 0000000000000000 R09: 0000000000000000 [ 24.183147] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000216 [ 24.183148] R13: ffffbf1a011d7624 R14: ffffa04ec78f8a08 R15: ffffa04ec78f8b40 [ 24.183151] FS: 00007f4c524b2140(0000) GS:ffffa05007d00000(0000) knlGS:0000000000000000
[   24.183153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 24.183155] CR2: 00007f7025eae3e8 CR3: 00000001040f8000 CR4: 00000000000006f0
[   24.183157] Call Trace:
[   24.183162]  <NMI>
[   24.183167]  ? nmi_cpu_backtrace+0xbf/0x140
[   24.183175]  ? nmi_cpu_backtrace_handler+0x11/0x20
[   24.183181]  ? nmi_handle+0x61/0x160
[   24.183186]  ? default_do_nmi+0x42/0x110
[   24.183191]  ? exc_nmi+0x1bd/0x290
[   24.183194]  ? end_repeat_nmi+0xf/0x53
[   24.183203]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183207]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183210]  ? __this_module+0x2d3d1/0x4f310 [tg3]
[   24.183213]  </NMI>
[   24.183214]  <TASK>
[   24.183215]  __this_module+0x31828/0x4f310 [tg3]
[   24.183218]  ? __this_module+0x2d390/0x4f310 [tg3]
[   24.183221]  __this_module+0x398e6/0x4f310 [tg3]
[   24.183225]  __this_module+0x3baf8/0x4f310 [tg3]
[   24.183229]  __this_module+0x4733f/0x4f310 [tg3]
[   24.183233]  ? _raw_spin_unlock_irqrestore+0x25/0x70
[   24.183237]  ? __this_module+0x398e6/0x4f310 [tg3]
[   24.183241]  __this_module+0x4b943/0x4f310 [tg3]
[   24.183244]  ? delay_tsc+0x89/0xf0
[   24.183249]  ? preempt_count_sub+0x51/0x60
[   24.183254]  __this_module+0x4be4b/0x4f310 [tg3]
[   24.183258]  __dev_open+0x103/0x1c0
[   24.183265]  __dev_change_flags+0x1bd/0x230
[   24.183269]  ? rtnl_getlink+0x362/0x400
[   24.183276]  dev_change_flags+0x26/0x70
[   24.183280]  do_setlink+0xe16/0x11f0
[   24.183286]  ? __nla_validate_parse+0x61/0xd40
[   24.183295]  __rtnl_newlink+0x63d/0x9f0
[   24.183301]  ? kmem_cache_alloc_node_noprof+0x12b/0x360
[   24.183308]  ? kmalloc_trace_noprof+0x11e/0x350
[   24.183312]  ? rtnl_newlink+0x2e/0x70
[   24.183316]  rtnl_newlink+0x47/0x70
[   24.183320]  rtnetlink_rcv_msg+0x152/0x400
[   24.183324]  ? __netlink_sendskb+0x68/0x90
[   24.183329]  ? netlink_unicast+0x237/0x290
[   24.183333]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
[   24.183336]  netlink_rcv_skb+0x5b/0x110
[   24.183343]  netlink_unicast+0x1a4/0x290
[   24.183347]  netlink_sendmsg+0x222/0x4a0
[   24.183350]  ? proc_get_long.constprop.0+0x116/0x210
[   24.183358]  ____sys_sendmsg+0x379/0x3b0
[   24.183363]  ? copy_msghdr_from_user+0x6d/0xb0
[   24.183368]  ___sys_sendmsg+0x86/0xe0
[   24.183372]  ? addrconf_sysctl_forward+0xf3/0x270
[   24.183378]  ? _copy_from_iter+0x8b/0x570
[   24.183384]  ? __pfx_addrconf_sysctl_forward+0x10/0x10
[   24.183388]  ? _raw_spin_unlock+0x19/0x50
[   24.183392]  ? proc_sys_call_handler+0xf3/0x2f0
[   24.183397]  ? trace_hardirqs_on+0x29/0x90
[   24.183401]  ? __fdget+0xc2/0xf0
[   24.183405]  __sys_sendmsg+0x5b/0xc0
[   24.183410]  ? syscall_trace_enter+0x110/0x1b0
[   24.183416]  do_syscall_64+0x64/0x150
[   24.183423]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

I have bisected the error to this commit. Reverting it caused no new or
perceivable issues on both the MacBook and a Zen4-based laptop.

[...]

Ohh, and when you say "causing peripherals such as the wireless card to
misbehave" what exactly do you mean?

When the kernel throws rcu stall messages, the wireless card on the
MacBook may fail to discover and/or connect to wireless networks - not a consistent behaviour but I suppose that something in the kernel got stuck.

See the ticket for more details and dmesg logs; the problem still
happens with 6.12-rc. The reporter is CCed.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

P.S.: let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:

#regzbot introduced: 55d4669ef1b76823083caecfab12a8bd2ccdcf64
#regzbot from: Mingcong Bai <jeffbai@xxxxxxx>
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=219390 #regzbot title: rcu: wifi problems since tg3 started throwing rcu stall
warnings
#regzbot ignore-activity







[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux