On Tue, Mar 11, 2025 at 8:41 PM Ryder Wang <rydercoding@xxxxxxxxxxx> wrote: > > Hi Rob, > > Thanks for your reply. Please don't top post. > This issue occurred on some embedded ARM system for some device driver which called of_find_node_by_name. Below is the kernel log including the call stack: > > [ 650.456107][ T3481] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1637 > [ 650.465589][ T3481] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3481, name: kworker/0:0 > [ 650.474970][ T3481] Preemption disabled at: > [ 650.474976][ T3481] [<ffffffd36bb03118>] of_find_node_by_name+0x2c/0x124 > [ 650.486191][ T3481] CPU: 0 PID: 3481 Comm: kworker/0:0 Tainted: G OE 5.15.149-debug-gc1dc9fe4253b-dirty #1 > [ 650.486208][ T3481] Hardware name: xxxxxxxxxxxxxxxxxxxxxxxxxx > [ 650.486219][ T3481] Workqueue: events_power_efficient phylink_resolve > [ 650.486244][ T3481] Call trace: > [ 650.486249][ T3481] dump_backtrace+0x0/0x214 > [ 650.486271][ T3481] show_stack+0x18/0x24 > [ 650.486287][ T3481] dump_stack_lvl+0x64/0x7c > [ 650.486305][ T3481] dump_stack+0x18/0x38 > [ 650.486319][ T3481] ___might_sleep+0x15c/0x180 > [ 650.486336][ T3481] __might_sleep+0x50/0x84 > [ 650.486348][ T3481] down_write+0x28/0x54 > [ 650.486364][ T3481] kernfs_remove+0x38/0x58 > [ 650.486381][ T3481] sysfs_remove_dir+0x54/0x70 > [ 650.486396][ T3481] __kobject_del+0x50/0xe8 > [ 650.486413][ T3481] kobject_cleanup+0x58/0x1e4 > [ 650.486427][ T3481] kobject_put+0x64/0xb0 > [ 650.486439][ T3481] of_node_put+0x1c/0x28 > [ 650.486454][ T3481] of_find_node_by_name+0x74/0x124 > [ 650.486466][ T3481] ethqos_configure_mac_v4+0x13b0/0x1750 Not a function in mainline... The assumption with of_find_node_by_name and all the dt functions that operate as iterators is you do a get on the 1st node before calling the 1st time, and then they all do a get on the next node and a put on the previous node. We could move the put out of the spinlock, but then you might not find the bug in the caller. Also, all the iterator functions do the same thing. One thing I noticed is for_each_of_allnodes_from() is not safe to call outside the spinlock and we have 1 user doing that (drivers/clk/ti/clk.c). Rob