On Tue, Nov 29, 2022 at 04:12:09PM +0200, Vladimir Oltean wrote: > This patch set deliberately targets net-next and lacks Fixes: tags due > to caution on my part. > > While testing some SFP modules on the Solidrun Honeycomb LX2K platform, > I noticed that rebooting causes a deadlock: > > ============================================ > WARNING: possible recursive locking detected > 6.1.0-rc5-07010-ga9b9500ffaac-dirty #656 Not tainted > -------------------------------------------- > systemd-shutdow/1 is trying to acquire lock: > ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30 > > but task is already holding lock: > ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(rtnl_mutex); > lock(rtnl_mutex); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 6 locks held by systemd-shutdow/1: > #0: ffffa62db6863c70 (system_transition_mutex){+.+.}-{4:4}, at: __do_sys_reboot+0xd4/0x260 > #1: ffff2f2b0176f100 (&dev->mutex){....}-{4:4}, at: device_shutdown+0xf4/0x260 > #2: ffff2f2b017be900 (&dev->mutex){....}-{4:4}, at: device_shutdown+0x104/0x260 > #3: ffff2f2b017680f0 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x40/0x260 > #4: ffff2f2b0e1608f0 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x40/0x260 > #5: ffffa62db6cf42f0 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock+0x1c/0x30 > > stack backtrace: > CPU: 6 PID: 1 Comm: systemd-shutdow Not tainted 6.1.0-rc5-07010-ga9b9500ffaac-dirty #656 > Hardware name: SolidRun LX2160A Honeycomb (DT) > Call trace: > lock_acquire+0x68/0x84 > __mutex_lock+0x98/0x460 > mutex_lock_nested+0x2c/0x40 > rtnl_lock+0x1c/0x30 > sfp_bus_del_upstream+0x1c/0xac > phylink_destroy+0x1c/0x50 > dpaa2_mac_disconnect+0x28/0x70 > dpaa2_eth_remove+0x1dc/0x1f0 > fsl_mc_driver_remove+0x24/0x60 > device_remove+0x70/0x80 > device_release_driver_internal+0x1f0/0x260 > device_links_unbind_consumers+0xe0/0x110 > device_release_driver_internal+0x138/0x260 > device_release_driver+0x18/0x24 > bus_remove_device+0x12c/0x13c > device_del+0x16c/0x424 > fsl_mc_device_remove+0x28/0x40 > __fsl_mc_device_remove+0x10/0x20 > device_for_each_child+0x5c/0xac > dprc_remove+0x94/0xb4 > fsl_mc_driver_remove+0x24/0x60 > device_remove+0x70/0x80 > device_release_driver_internal+0x1f0/0x260 > device_release_driver+0x18/0x24 > bus_remove_device+0x12c/0x13c > device_del+0x16c/0x424 > fsl_mc_bus_remove+0x8c/0x10c > fsl_mc_bus_shutdown+0x10/0x20 > platform_shutdown+0x24/0x3c > device_shutdown+0x15c/0x260 > kernel_restart+0x40/0xa4 > __do_sys_reboot+0x1e4/0x260 > __arm64_sys_reboot+0x24/0x30 > > But fixing this appears to be not so simple. The patch set represents my > attempt to address it. > > In short, the problem is that dpaa2_mac_connect() and dpaa2_mac_disconnect() > call 2 phylink functions in a row, one takes rtnl_lock() itself - > phylink_create(), and one which requires rtnl_lock() to be held by the > caller - phylink_fwnode_phy_connect(). The existing approach in the > drivers is too simple. We take rtnl_lock() when calling dpaa2_mac_connect(), > which is what results in the deadlock. > > Fixing just that creates another problem. The drivers make use of > rtnl_lock() for serializing with other code paths too. I think I've > found all those code paths, and established other mechanisms for > serializing with them. > > Vladimir Oltean (12): > net: dpaa2-eth: don't use -ENOTSUPP error code > net: dpaa2: replace dpaa2_mac_is_type_fixed() with > dpaa2_mac_is_type_phy() > net: dpaa2-mac: absorb phylink_start() call into dpaa2_mac_start() > net: dpaa2-mac: remove defensive check in dpaa2_mac_disconnect() > net: dpaa2-eth: assign priv->mac after dpaa2_mac_connect() call > net: dpaa2-switch: assign port_priv->mac after dpaa2_mac_connect() > call > net: dpaa2: publish MAC stringset to ethtool -S even if MAC is missing > net: dpaa2-switch replace direct MAC access with > dpaa2_switch_port_has_mac() > net: dpaa2-eth: connect to MAC before requesting the "endpoint > changed" IRQ > net: dpaa2-eth: serialize changes to priv->mac with a mutex > net: dpaa2-switch: serialize changes to priv->mac with a mutex > net: dpaa2-mac: move rtnl_lock() only around > phylink_{,dis}connect_phy() > > .../freescale/dpaa2/mac-phy-support.rst | 9 +- > .../net/ethernet/freescale/dpaa2/dpaa2-eth.c | 87 ++++++++++++------- > .../net/ethernet/freescale/dpaa2/dpaa2-eth.h | 11 +-- > .../ethernet/freescale/dpaa2/dpaa2-ethtool.c | 70 ++++++++++----- > .../net/ethernet/freescale/dpaa2/dpaa2-mac.c | 16 +++- > .../net/ethernet/freescale/dpaa2/dpaa2-mac.h | 10 ++- > .../freescale/dpaa2/dpaa2-switch-ethtool.c | 45 ++++++---- > .../ethernet/freescale/dpaa2/dpaa2-switch.c | 59 +++++++++---- > .../ethernet/freescale/dpaa2/dpaa2-switch.h | 9 +- > 9 files changed, 212 insertions(+), 104 deletions(-) For the entire patch set: Reviewed-by: Ioana Ciornei <ioana.ciornei@xxxxxxx> Tested-by: Ioana Ciornei <ioana.ciornei@xxxxxxx>