On 8/16/21 9:46 AM, Vladimir Oltean wrote: > Hi Frank, > > On Mon, Aug 16, 2021 at 09:33:03AM -0500, Frank Rowand wrote: >> Hi Vladimir, >> >> On 8/13/21 8:01 PM, Vladimir Oltean wrote: >>> Hi, >>> >>> I was debugging an RCU stall which happened during the probing of a >>> driver. Activating lock debugging, I see: >> >> I took a quick look at sja1105_mdiobus_register() in v5.14-rc1 and v5.14-rc6. >> >> Looking at the following stack trace, I did not see any calls to >> of_find_compatible_node() in sja1105_mdiobus_register(). I am >> guessing that maybe there is an inlined function that calls >> of_find_compatible_node(). This would likely be either >> sja1105_mdiobus_base_tx_register() or sja1105_mdioux_base_t1_register(). > > Yes, it is sja1105_mdiobus_base_t1_register which is inlined. > >>> >>> [ 101.710694] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:938 >>> [ 101.719119] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1534, name: sh >>> [ 101.726763] INFO: lockdep is turned off. >>> [ 101.730674] irq event stamp: 0 >>> [ 101.733716] hardirqs last enabled at (0): [<0000000000000000>] 0x0 >>> [ 101.739973] hardirqs last disabled at (0): [<ffffd3ebecb10120>] copy_process+0xa78/0x1a98 >>> [ 101.748146] softirqs last enabled at (0): [<ffffd3ebecb10120>] copy_process+0xa78/0x1a98 >>> [ 101.756313] softirqs last disabled at (0): [<0000000000000000>] 0x0 >>> [ 101.762569] CPU: 4 PID: 1534 Comm: sh Not tainted 5.14.0-rc5+ #272 >>> [ 101.774558] Call trace: >>> [ 101.794734] __might_sleep+0x50/0x88 >>> [ 101.798297] __mutex_lock+0x60/0x938 >>> [ 101.801863] mutex_lock_nested+0x38/0x50 >>> [ 101.805775] kernfs_remove+0x2c/0x50 <---- this takes mutex_lock(&kernfs_mutex); >>> [ 101.809341] sysfs_remove_dir+0x54/0x70 >> >> The __kobject_del() occurs only if the refcount on the node >> becomes zero. This should never be true when of_find_compatible_node() >> calls of_node_put() unless a "from" node is passed to of_find_compatible_node(). > > I figured that was the assumption, that the of_node_put would never > trigger a sysfs file / kobject deletion from there. > >> In both sja1105_mdiobus_base_tx_register() and sja1105_mdioux_base_t1_register() >> a from node ("mdio") is passed to of_find_compatible_node() without first doing an >> of_node_get(mdio). If you add the of_node_get() calls the problem should be fixed. > > The answer seems simple enough, but stupid question, but why does > of_find_compatible_node call of_node_put on "from" in the first place? Actually a good question. I do not know why of_find_compatible_node() calls of_node_put() instead of making the caller of of_find_compatible_node() responsible. That pattern was created long before I was involved in devicetree and I have not gone back to read the review comments of when that code was created. -Frank