Hi Rob, Thanks for your reply. This issue occurred on some embedded ARM system for some device driver which called of_find_node_by_name. Below is the kernel log including the call stack: [ 650.456107][ T3481] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1637 [ 650.465589][ T3481] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3481, name: kworker/0:0 [ 650.474970][ T3481] Preemption disabled at: [ 650.474976][ T3481] [<ffffffd36bb03118>] of_find_node_by_name+0x2c/0x124 [ 650.486191][ T3481] CPU: 0 PID: 3481 Comm: kworker/0:0 Tainted: G OE 5.15.149-debug-gc1dc9fe4253b-dirty #1 [ 650.486208][ T3481] Hardware name: xxxxxxxxxxxxxxxxxxxxxxxxxx [ 650.486219][ T3481] Workqueue: events_power_efficient phylink_resolve [ 650.486244][ T3481] Call trace: [ 650.486249][ T3481] dump_backtrace+0x0/0x214 [ 650.486271][ T3481] show_stack+0x18/0x24 [ 650.486287][ T3481] dump_stack_lvl+0x64/0x7c [ 650.486305][ T3481] dump_stack+0x18/0x38 [ 650.486319][ T3481] ___might_sleep+0x15c/0x180 [ 650.486336][ T3481] __might_sleep+0x50/0x84 [ 650.486348][ T3481] down_write+0x28/0x54 [ 650.486364][ T3481] kernfs_remove+0x38/0x58 [ 650.486381][ T3481] sysfs_remove_dir+0x54/0x70 [ 650.486396][ T3481] __kobject_del+0x50/0xe8 [ 650.486413][ T3481] kobject_cleanup+0x58/0x1e4 [ 650.486427][ T3481] kobject_put+0x64/0xb0 [ 650.486439][ T3481] of_node_put+0x1c/0x28 [ 650.486454][ T3481] of_find_node_by_name+0x74/0x124 [ 650.486466][ T3481] ethqos_configure_mac_v4+0x13b0/0x1750 [ 650.486485][ T3481] ethqos_fix_mac_speed+0x48c/0x1174 [ 650.486500][ T3481] stmmac_mac_link_up+0x25c/0x504 [ 650.486517][ T3481] phylink_resolve+0x1b4/0x5c0 [ 650.486529][ T3481] process_one_work+0x1a8/0x3a0 [ 650.486546][ T3481] worker_thread+0x22c/0x490 [ 650.486559][ T3481] kthread+0x154/0x218 [ 650.486573][ T3481] ret_from_fork+0x10/0x20 [ 650.486863][ T3481] BUG: spinlock recursion on CPU#0, kworker/0:0/3481 [ 650.493577][ T3481] lock: 0xffffffd36c5a11e0, .magic: dead4ead, .owner: kworker/0:0/3481, .owner_cpu: 0 ________________________________________ From: Rob Herring <robh@xxxxxxxxxx> Sent: Tuesday, March 11, 2025 21:13 To: Ryder Wang Cc: devicetree@xxxxxxxxxxxxxxx Subject: Re: Bug: lock problem for the function of_find_node_by_name On Sat, Mar 08, 2025 at 10:00:31AM +0000, Ryder Wang wrote: > It looks there is a potential bug in some device tree function in > Kernel code (It does not depend on the version of the kernel). > > One device tree function of_find_node_by_name() calls > raw_spin_lock_irqsave() to lock. Then it calls of_node_put(), before > unlocking (raw_spin_unlock_irqrestore). of_node_put() will call > kernfs_remove() in some cases. So problem is here: kernfs_remove() > will always call down_write() which might make the process sleep. As > we know, sleep is not allowed between lock and unlock of spin lock. > That's why there is might_sleep checking within down_write(), as > there may be dead lock risk or disabling interrupt too long. > > The actual call trace is like this: > of_find_node_by_name > raw_spin_lock_irqsave > ... > kernfs_remove > down_write > ... > raw_spin_unlock_irqrestore The bug here would be the reference count going to 0. Do you have a case or unittest that can trigger this? Rob