Hello Florian, On Tue, Mar 12, 2024 at 7:48 PM Boqun Feng <boqun.feng@xxxxxxxxx> wrote: > > On Tue, Mar 12, 2024 at 02:44:14PM -0700, Linus Torvalds wrote: > > On Tue, 12 Mar 2024 at 14:34, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote: > > > > > > and here is a log where this fails: > > > > > > https://gist.github.com/ffainelli/ed08a2b3e853f59343786ebd20364fc8 > > > > You could try the 'initcall_debug' kernel command line. > > > > Right, that'll be helpful. > > Besides I took a look at the config Florian shared, no TASKS_RCU, > RCU_LAZY or RCU nocb is enabled. So probably the only left changes in > the PR are around RCU expedited. Florian, could you see if you can build > and test with CONFIG_PROVE_RCU=y (you need to select > CONFIG_PROVE_LOCKING for that)? That'll call synchronize_rcu() + > synchronize_rcu_expedited() before and after we switch > rcu_scheduler_active, and it may provide more information. Thanks! Adding to everyone's suggestions, could you also try booting with "rcupdate.rcu_normal=1" ? This will disable expedited RCU and help us further confirm that it is indeed expedited RCU (and then we can look into fixing that). Also there are 2 additional users of expedited RCU in this release I noticed: 78c3253f27e5 ("net: use synchronize_rcu_expedited in cleanup_net()") 1ebb85f9c03d ("netfilter: conntrack: expedite rcu in nf_conntrack_cleanup_net_list") Could you also try reverting those patches as well, and see if the issue goes away? thanks, - Joel On Tue, Mar 12, 2024 at 7:48 PM Boqun Feng <boqun.feng@xxxxxxxxx> wrote: > > On Tue, Mar 12, 2024 at 02:44:14PM -0700, Linus Torvalds wrote: > > On Tue, 12 Mar 2024 at 14:34, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote: > > > > > > and here is a log where this fails: > > > > > > https://gist.github.com/ffainelli/ed08a2b3e853f59343786ebd20364fc8 > > > > You could try the 'initcall_debug' kernel command line. > > > > Right, that'll be helpful. > > Besides I took a look at the config Florian shared, no TASKS_RCU, > RCU_LAZY or RCU nocb is enabled. So probably the only left changes in > the PR are around RCU expedited. Florian, could you see if you can build > and test with CONFIG_PROVE_RCU=y (you need to select > CONFIG_PROVE_LOCKING for that)? That'll call synchronize_rcu() + > synchronize_rcu_expedited() before and after we switch > rcu_scheduler_active, and it may provide more information. Thanks! > > Regards, > Boqun > > > It will make the above *much* noisier, but it might - thanks to all > > the new noise - show exactly *what* is being crazy slow to initialize. > > > > Because right now it's just radio silence in between those > > > > [ 1.926435] bcmgenet f0480000.ethernet: GENET 5.0 EPHY: 0x0000 > > [ 162.148135] unimac-mdio unimac-mdio.0: Broadcom UniMAC MDIO bus > > > > things, and that's presumably because some random initcall there just > > takes forever to time out. > > > > Linus