Search Linux Wireless

Re: Kernel deadlock in 6.7.5 + hacks, maybe debugfs related.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/27/24 05:47, Johannes Berg wrote:

Feb 26 06:01:45 ct523c-0b0b kernel: task:ip              state:D stack:0     pid:28125 tgid:28125 ppid:3604   flags:0x00004002
Feb 26 06:01:45 ct523c-0b0b kernel: Call Trace:
Feb 26 06:01:45 ct523c-0b0b kernel:  <TASK>
Feb 26 06:01:45 ct523c-0b0b kernel:  __schedule+0x42c/0xde0
Feb 26 06:01:45 ct523c-0b0b kernel:  schedule+0x3c/0x120
Feb 26 06:01:45 ct523c-0b0b kernel:  schedule_timeout+0x19c/0x1b0
Feb 26 06:01:45 ct523c-0b0b kernel:  ? mark_held_locks+0x49/0x70
Feb 26 06:01:45 ct523c-0b0b kernel:  __wait_for_common+0xba/0x1d0
Feb 26 06:01:45 ct523c-0b0b kernel:  ? usleep_range_state+0xb0/0xb0
Feb 26 06:01:45 ct523c-0b0b kernel:  remove_one+0x6b/0x100

Can you say where this remove_one+0x6b is?

I feel it's probably this:

        if (!refcount_dec_and_test(&fsd->active_users)) {
                wait_for_completion(&fsd->active_users_drained);

which ... looking at it, seems wrong?

_Completely_ untested:

diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
index 034a617cb1a5..fb636478c54d 100644
--- a/fs/debugfs/inode.c
+++ b/fs/debugfs/inode.c
@@ -751,13 +751,19 @@ static void __debugfs_file_removed(struct dentry *dentry)
  	if ((unsigned long)fsd & DEBUGFS_FSDATA_IS_REAL_FOPS_BIT)
  		return;
- /* if we hit zero, just wait for all to finish */
-	if (!refcount_dec_and_test(&fsd->active_users)) {
-		wait_for_completion(&fsd->active_users_drained);
-		return;
-	}
+	/*
+	 * Now that debugfs_file_get() no longer sees a valid entry,
+	 * decrement the refcount to remove the initial reference.
+	 */
+	refcount_dec(&fsd->active_users);
- /* if we didn't hit zero, try to cancel any we can */
+	/*
+	 * As long as it's not zero, try to cancel any cancellations,
+	 * new incoming ones will wake up the completion as we might
+	 * have raced: debugfs_file_get() had already been done, but
+	 * debugfs_enter_cancellation() hadn't, by the time we got
+	 * to this point here.
+	 */
  	while (refcount_read(&fsd->active_users)) {
  		struct debugfs_cancellation *c;

I see this splat with the patch applied:

[   94.576688] ------------[ cut here ]------------
[   94.576699] refcount_t: decrement hit 0; leaking memory.
[   94.576717] WARNING: CPU: 1 PID: 5686 at lib/refcount.c:31 refcount_warn_saturate+0x42/0xe0
[ 94.576724] Modules linked in: nf_conntrack_netlink nfnetlink xt_MASQUERADE iptable_nat nf_nat iptable_raw xt_CT nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter vrf 8021q garp mrp stp llc macvlan pktgen rpcrdma rdma_cm iw_cm ib_cm ib_core qrtr iTCO_wdt intel_pmc_bxt ee1004 intel_rapl_msr iTCO_vendor_support snd_hda_codec_hdmi mt7915e mt76_connac_lib snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mt76 coretemp intel_rapl_common mac80211 intel_tcc_cooling snd_hda_intel x86_pkg_temp_thermal intel_powerclamp snd_intel_dspcfg snd_hda_codec intel_wmi_thunderbolt cfg80211 snd_hda_core snd_hwdep pl2303 bfq snd_seq mei_hdcp mei_pxp snd_seq_device snd_pcm snd_timer i2c_i801 snd intel_pch_thermal soundcore i2c_smbus pcspkr acpi_pad nfsd auth_rpcgss nfs_acl sch_fq_codel lockd grace sunrpc fuse zram raid1 dm_raid raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq i915 drm_buddy intel_gtt drm_display_helper drm_kms_helper cec igb rc_core i2c_algo_bit ttm drm ixgbe agpgart mdio
[   94.576822]  dca xhci_pci mei_wdt i2c_core xhci_pci_renesas hwmon video wmi [last unloaded: nfnetlink]
[   94.576833] CPU: 1 PID: 5686 Comm: iw Not tainted 6.7.5+ #2
[   94.576836] Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
[   94.576838] RIP: 0010:refcount_warn_saturate+0x42/0xe0
[ 94.576841] Code: 80 3d 24 70 3b 01 00 0f 84 a0 00 00 00 c3 80 3d 15 70 3b 01 00 75 f6 48 c7 c7 10 a5 66 82 c6 05 05 70 3b 01 01 e8 9e a9 a9 ff <0f> 0b c3 80 3d f9 6f 3b 01 00 75 d7 48 c7 c7 90 a4 66 82 c6 05 e9
[   94.576843] RSP: 0018:ffffc900063df848 EFLAGS: 00010282
[   94.576846] RAX: 0000000000000000 RBX: ffff88810e752750 RCX: 0000000000000027
[   94.576848] RDX: ffff88845dc5c708 RSI: 0000000000000001 RDI: ffff88845dc5c700
[   94.576850] RBP: ffff88811b8af400 R08: 0000000000000000 R09: ffffc900063df6f0
[   94.576851] R10: 0000000000000003 R11: ffffffff8296a2e8 R12: ffff88811000b1d8
[   94.576853] R13: ffff88810e7dcea0 R14: 0000000000000000 R15: ffff88810e752868
[   94.576855] FS:  00007f31ae1f3b80(0000) GS:ffff88845dc40000(0000) knlGS:0000000000000000
[   94.576857] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   94.576859] CR2: 00007f0533889a90 CR3: 0000000129c8f003 CR4: 00000000003706f0
[   94.576861] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   94.576862] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   94.576864] Call Trace:
[   94.576865]  <TASK>
[   94.576867]  ? refcount_warn_saturate+0x42/0xe0
[   94.576869]  ? __warn+0x7c/0x170
[   94.576873]  ? refcount_warn_saturate+0x42/0xe0
[   94.576877]  ? report_bug+0x169/0x1a0
[   94.576883]  ? handle_bug+0x41/0x70
[   94.576887]  ? exc_invalid_op+0x13/0x60
[   94.576890]  ? asm_exc_invalid_op+0x16/0x20
[   94.576900]  ? refcount_warn_saturate+0x42/0xe0
[   94.576903]  ? refcount_warn_saturate+0x42/0xe0
[   94.576905]  remove_one+0xde/0xf0
[   94.576910]  simple_recursive_removal+0x20c/0x2b0
[   94.576914]  ? start_creating.part.0+0x170/0x170
[   94.576919]  debugfs_remove+0x3b/0x60
[   94.576922]  ieee80211_debugfs_remove_netdev+0x15/0x30 [mac80211]
[   94.576998]  ieee80211_teardown_sdata+0x13/0x50 [mac80211]
[   94.577036]  unregister_netdevice_many_notify+0x3b9/0x7e0
[   94.577045]  unregister_netdevice_queue+0x84/0xc0
[   94.577049]  _cfg80211_unregister_wdev+0x1c5/0x210 [cfg80211]
[   94.577117]  ieee80211_if_remove+0x9b/0x110 [mac80211]
[   94.577166]  ieee80211_del_iface+0xc/0x10 [mac80211]
[   94.577220]  cfg80211_remove_virtual_intf+0x42/0x120 [cfg80211]
[   94.577257]  genl_family_rcv_msg_doit+0xd1/0x120
[   94.577267]  genl_rcv_msg+0x182/0x290
[   94.577270]  ? __cfg80211_wdev_from_attrs+0x2b0/0x2b0 [cfg80211]
[   94.577306]  ? nl80211_stop_ap+0x30/0x30 [cfg80211]
[   94.577341]  ? nlmsg_trim+0x20/0x20 [cfg80211]
[   94.577378]  ? genl_family_rcv_msg_dumpit+0xf0/0xf0
[   94.577383]  netlink_rcv_skb+0x4f/0x100
[   94.577392]  genl_rcv+0x1f/0x30
[   94.577395]  netlink_unicast+0x18e/0x270
[   94.577400]  netlink_sendmsg+0x257/0x4d0
[   94.577406]  __sock_sendmsg+0x33/0x60
[   94.577411]  ____sys_sendmsg+0x22c/0x2a0
[   94.577414]  ? copy_msghdr_from_user+0x68/0xa0
[   94.577420]  ___sys_sendmsg+0x81/0xc0
[   94.577424]  ? __lock_acquire+0x405/0x2380
[   94.577430]  ? __lock_acquire+0x405/0x2380
[   94.577435]  ? reacquire_held_locks+0xd3/0x1f0
[   94.577438]  ? do_user_addr_fault+0x322/0x850
[   94.577443]  ? lock_acquire+0xc6/0x2b0
[   94.577448]  __sys_sendmsg+0x52/0xa0
[   94.577456]  do_syscall_64+0x3b/0x110
[   94.577460]  entry_SYSCALL_64_after_hwframe+0x46/0x4e
[   94.577464] RIP: 0033:0x7f31adf13737
[ 94.577467] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[   94.577469] RSP: 002b:00007ffd1e029ae8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[   94.577472] RAX: ffffffffffffffda RBX: 0000000000924390 RCX: 00007f31adf13737
[   94.577474] RDX: 0000000000000000 RSI: 00007ffd1e029b40 RDI: 0000000000000004
[   94.577476] RBP: 0000000000929780 R08: 00000000009242a0 R09: 0000000000000000
[   94.577478] R10: 00007f31ae20a3e0 R11: 0000000000000246 R12: 00000000009298c0
[   94.577479] R13: 00007ffd1e029b40 R14: 00007ffd1e029c70 R15: 000000000043d280
[   94.577487]  </TASK>
[   94.577488] irq event stamp: 12053
[   94.577490] hardirqs last  enabled at (12059): [<ffffffff8121af54>] console_unlock+0x114/0x140
[   94.577495] hardirqs last disabled at (12064): [<ffffffff8121af39>] console_unlock+0xf9/0x140
[   94.577497] softirqs last  enabled at (11844): [<ffffffff81188b97>] __irq_exit_rcu+0x77/0xa0
[   94.577500] softirqs last disabled at (11839): [<ffffffff81188b97>] __irq_exit_rcu+0x77/0xa0
[   94.577502] ---[ end trace 0000000000000000 ]---
[  103.657993] workqueue: gc_worker [nf_conntrack] hogged CPU for >10000us 8 times, consider switching to WQ_UNBOUND
[  148.747435] workqueue: gc_worker [nf_conntrack] hogged CPU for >10000us 16 times, consider switching to WQ_UNBOUND

Thanks,
Ben



johannes


--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com





[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux