Re: Bug: lock problem for the function of_find_node_by_name

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Rob,

Thanks for your reply.

This issue occurred on some embedded ARM system for some device driver which called of_find_node_by_name. Below is the kernel log including the call stack:

    [  650.456107][ T3481] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1637
    [  650.465589][ T3481] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 3481, name: kworker/0:0
    [  650.474970][ T3481] Preemption disabled at:
    [  650.474976][ T3481] [<ffffffd36bb03118>] of_find_node_by_name+0x2c/0x124
    [  650.486191][ T3481] CPU: 0 PID: 3481 Comm: kworker/0:0 Tainted: G           OE     5.15.149-debug-gc1dc9fe4253b-dirty #1
    [  650.486208][ T3481] Hardware name: xxxxxxxxxxxxxxxxxxxxxxxxxx
    [  650.486219][ T3481] Workqueue: events_power_efficient phylink_resolve
    [  650.486244][ T3481] Call trace:
    [  650.486249][ T3481]  dump_backtrace+0x0/0x214
    [  650.486271][ T3481]  show_stack+0x18/0x24
    [  650.486287][ T3481]  dump_stack_lvl+0x64/0x7c
    [  650.486305][ T3481]  dump_stack+0x18/0x38
    [  650.486319][ T3481]  ___might_sleep+0x15c/0x180
    [  650.486336][ T3481]  __might_sleep+0x50/0x84
    [  650.486348][ T3481]  down_write+0x28/0x54
    [  650.486364][ T3481]  kernfs_remove+0x38/0x58
    [  650.486381][ T3481]  sysfs_remove_dir+0x54/0x70
    [  650.486396][ T3481]  __kobject_del+0x50/0xe8
    [  650.486413][ T3481]  kobject_cleanup+0x58/0x1e4
    [  650.486427][ T3481]  kobject_put+0x64/0xb0
    [  650.486439][ T3481]  of_node_put+0x1c/0x28
    [  650.486454][ T3481]  of_find_node_by_name+0x74/0x124
    [  650.486466][ T3481]  ethqos_configure_mac_v4+0x13b0/0x1750
    [  650.486485][ T3481]  ethqos_fix_mac_speed+0x48c/0x1174
    [  650.486500][ T3481]  stmmac_mac_link_up+0x25c/0x504
    [  650.486517][ T3481]  phylink_resolve+0x1b4/0x5c0
    [  650.486529][ T3481]  process_one_work+0x1a8/0x3a0
    [  650.486546][ T3481]  worker_thread+0x22c/0x490
    [  650.486559][ T3481]  kthread+0x154/0x218
    [  650.486573][ T3481]  ret_from_fork+0x10/0x20
    [  650.486863][ T3481] BUG: spinlock recursion on CPU#0, kworker/0:0/3481
    [  650.493577][ T3481]  lock: 0xffffffd36c5a11e0, .magic: dead4ead, .owner: kworker/0:0/3481, .owner_cpu: 0

________________________________________
From: Rob Herring <robh@xxxxxxxxxx>
Sent: Tuesday, March 11, 2025 21:13
To: Ryder Wang
Cc: devicetree@xxxxxxxxxxxxxxx
Subject: Re: Bug: lock problem for the function of_find_node_by_name

On Sat, Mar 08, 2025 at 10:00:31AM +0000, Ryder Wang wrote:
> It looks there is a potential bug in some device tree function in
> Kernel code (It does not depend on the version of the kernel).
>
> One device tree function of_find_node_by_name() calls
> raw_spin_lock_irqsave() to lock. Then it calls of_node_put(), before
> unlocking (raw_spin_unlock_irqrestore). of_node_put() will call
> kernfs_remove() in some cases. So problem is here: kernfs_remove()
> will always call down_write() which might make the process sleep. As
> we know, sleep is not allowed between lock and unlock of spin lock.
> That's why there is might_sleep checking within down_write(), as
> there may be dead lock risk or disabling interrupt too long.
>
> The actual call trace is like this:
> of_find_node_by_name
>     raw_spin_lock_irqsave
>         ...
>         kernfs_remove
>             down_write
>         ...
>     raw_spin_unlock_irqrestore

The bug here would be the reference count going to 0. Do you have a
case or unittest that can trigger this?

Rob





[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux