On Mon, 2024-02-26 at 06:58 -0800, Ben Greear wrote: > I managed to catch it in a lockdep enabled kernel. There were no lockdep > warnings, but more lock info was printed at least: > > Maybe ordering of this one's lock printout means it is 'ip' that is actually holding the rtnl? > > [Seen in context further below] > Feb 26 06:01:45 ct523c-0b0b kernel: 3 locks held by ip/28125: > Feb 26 06:01:45 ct523c-0b0b kernel: #0: ffffffff82a6d588 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x13e/0x620 > Feb 26 06:01:45 ct523c-0b0b kernel: #1: ffff888122e60768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ieee80211_stop+0x2d/0x180 [mac80211] > Feb 26 06:01:45 ct523c-0b0b kernel: #2: ffff88812661d768 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: simple_recursive_removal+0x189/0x2b0 Yes, probably. > Feb 26 06:01:45 ct523c-0b0b kernel: INFO: task kworker/6:3:542 blocked for more than 180 seconds. > Feb 26 06:01:45 ct523c-0b0b kernel: Not tainted 6.7.5+ #1 > Feb 26 06:01:45 ct523c-0b0b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 26 06:01:45 ct523c-0b0b kernel: task:kworker/6:3 state:D stack:0 pid:542 tgid:542 ppid:2 flags:0x00004000 > Feb 26 06:01:45 ct523c-0b0b kernel: Workqueue: ipv6_addrconf addrconf_verify_work > Feb 26 06:01:45 ct523c-0b0b kernel: Call Trace: > Feb 26 06:01:45 ct523c-0b0b kernel: <TASK> > Feb 26 06:01:45 ct523c-0b0b kernel: __schedule+0x42c/0xde0 > Feb 26 06:01:45 ct523c-0b0b kernel: schedule+0x3c/0x120 > Feb 26 06:01:45 ct523c-0b0b kernel: schedule_preempt_disabled+0xc/0x20 > Feb 26 06:01:45 ct523c-0b0b kernel: __mutex_lock+0x5c4/0xc00 > Feb 26 06:01:45 ct523c-0b0b kernel: ? lock_acquire+0xc6/0x2b0 > Feb 26 06:01:45 ct523c-0b0b kernel: ? addrconf_verify_work+0x9/0x20 > Feb 26 06:01:45 ct523c-0b0b kernel: ? process_one_work+0x18f/0x4e0 > Feb 26 06:01:45 ct523c-0b0b kernel: ? addrconf_verify_work+0x9/0x20 > Feb 26 06:01:45 ct523c-0b0b kernel: ? process_one_work+0x18f/0x4e0 > Feb 26 06:01:45 ct523c-0b0b kernel: addrconf_verify_work+0x9/0x20 Lots of things like this are just blocked on RTNL. > Feb 26 06:01:45 ct523c-0b0b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 26 06:01:45 ct523c-0b0b kernel: task:kworker/u16:1 state:D stack:0 pid:8298 tgid:8298 ppid:2 flags:0x00004000 > Feb 26 06:01:45 ct523c-0b0b kernel: Workqueue: events_unbound cfg80211_wiphy_work [cfg80211] > Feb 26 06:01:45 ct523c-0b0b kernel: Call Trace: > Feb 26 06:01:45 ct523c-0b0b kernel: <TASK> > Feb 26 06:01:45 ct523c-0b0b kernel: __schedule+0x42c/0xde0 > Feb 26 06:01:45 ct523c-0b0b kernel: schedule+0x3c/0x120 > Feb 26 06:01:45 ct523c-0b0b kernel: schedule_preempt_disabled+0xc/0x20 > Feb 26 06:01:45 ct523c-0b0b kernel: __mutex_lock+0x5c4/0xc00 > Feb 26 06:01:45 ct523c-0b0b kernel: ? cfg80211_wiphy_work+0x1a/0xc0 [cfg80211] > Feb 26 06:01:45 ct523c-0b0b kernel: ? process_one_work+0x18f/0x4e0 > Feb 26 06:01:45 ct523c-0b0b kernel: ? cfg80211_wiphy_work+0x1a/0xc0 [cfg80211] > Feb 26 06:01:45 ct523c-0b0b kernel: cfg80211_wiphy_work+0x1a/0xc0 [cfg80211] this is blocked on wiphy mutex. > Feb 26 06:01:45 ct523c-0b0b kernel: INFO: task ip:28125 blocked for more than 180 seconds. > Feb 26 06:01:45 ct523c-0b0b kernel: Not tainted 6.7.5+ #1 > Feb 26 06:01:45 ct523c-0b0b kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Feb 26 06:01:45 ct523c-0b0b kernel: task:ip state:D stack:0 pid:28125 tgid:28125 ppid:3604 flags:0x00004002 > Feb 26 06:01:45 ct523c-0b0b kernel: Call Trace: > Feb 26 06:01:45 ct523c-0b0b kernel: <TASK> > Feb 26 06:01:45 ct523c-0b0b kernel: __schedule+0x42c/0xde0 > Feb 26 06:01:45 ct523c-0b0b kernel: schedule+0x3c/0x120 > Feb 26 06:01:45 ct523c-0b0b kernel: schedule_timeout+0x19c/0x1b0 > Feb 26 06:01:45 ct523c-0b0b kernel: ? mark_held_locks+0x49/0x70 > Feb 26 06:01:45 ct523c-0b0b kernel: __wait_for_common+0xba/0x1d0 > Feb 26 06:01:45 ct523c-0b0b kernel: ? usleep_range_state+0xb0/0xb0 > Feb 26 06:01:45 ct523c-0b0b kernel: remove_one+0x6b/0x100 > Feb 26 06:01:45 ct523c-0b0b kernel: simple_recursive_removal+0x20c/0x2b0 > Feb 26 06:01:45 ct523c-0b0b kernel: ? start_creating.part.0+0x170/0x170 > Feb 26 06:01:45 ct523c-0b0b kernel: debugfs_remove+0x3b/0x60 > Feb 26 06:01:45 ct523c-0b0b kernel: ieee80211_debugfs_recreate_netdev+0x30/0xac0 [mac80211] This seems to be blocked on the wait_for_completion()?! Which means it should have actually triggered the c->cancel() first, but ... if wiphy_locked_debugfs_read_cancel() had actually been called, I'm not sure why we'd be stuck here? Then again, I'm looking at the code now and not sure why I put this + /* if we're already removing wake it up to cancel */ + if (d_unlinked(dentry)) + complete(&fsd->active_users_drained); into debugfs_enter_cancellation()... > Feb 26 06:01:45 ct523c-0b0b kernel: > Showing all locks held in the system: This isn't always clear on holding vs. acquiring, FWIW. > Feb 26 06:01:45 ct523c-0b0b kernel: 3 locks held by kworker/6:3/542: > Feb 26 06:01:45 ct523c-0b0b kernel: #0: ffff888109b33b38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work+0x18f/0x4e0 > Feb 26 06:01:45 ct523c-0b0b kernel: #1: ffffc90000f77e58 ((work_completion)(&(&net->ipv6.addr_chk_work)->work)){+.+.}-{0:0}, at: process_one_work+0x18f/0x4e0 > Feb 26 06:01:45 ct523c-0b0b kernel: #2: ffffffff82a6d588 (rtnl_mutex){+.+.}-{4:4}, at: addrconf_verify_work+0x9/0x20 See this, it's actually stuck on acquiring the RTNL. > Feb 26 06:01:45 ct523c-0b0b kernel: 3 locks held by kworker/u16:1/8298: > Feb 26 06:01:45 ct523c-0b0b kernel: #0: ffff888108058d38 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x18f/0x4e0 > Feb 26 06:01:45 ct523c-0b0b kernel: #1: ffffc9000abcfe58 ((work_completion)(&rdev->wiphy_work)){+.+.}-{0:0}, at: process_one_work+0x18f/0x4e0 > Feb 26 06:01:45 ct523c-0b0b kernel: #2: ffff888122e60768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: cfg80211_wiphy_work+0x1a/0xc0 [cfg80211] And this is stuck trying to acquire the wiphy mutex. > Feb 26 06:01:45 ct523c-0b0b kernel: 3 locks held by ip/28125: > Feb 26 06:01:45 ct523c-0b0b kernel: #0: ffffffff82a6d588 (rtnl_mutex){+.+.}-{4:4}, at: rtnetlink_rcv_msg+0x13e/0x620 > Feb 26 06:01:45 ct523c-0b0b kernel: #1: ffff888122e60768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ieee80211_stop+0x2d/0x180 [mac80211] > Feb 26 06:01:45 ct523c-0b0b kernel: #2: ffff88812661d768 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: simple_recursive_removal+0x189/0x2b0 This has them all. johannes