Hello Behme, > From: Behme Dirk (CM/ESO2), Sent: Tuesday, October 10, 2023 9:59 PM > > On 26.07.2023 05:19, Jakub Kicinski wrote: > ... > > The fact that ravb_tx_timeout_work doesn't take any locks seems much > > more suspicious. > Does anybody plan to look into this, too? I believe my fixed patch [1] resolved this issue too. Let me explain it in detail below. In the thread, Jakub also mentioned [2] like below: --- Simplest fix I can think of is to take a reference on the netdev before scheduling the work, and then check if it's still registered in the work itself. Wrap the timeout work in rtnl_lock() to avoid any races there. --- Sergey suggested to add cancel_work_sync() into the ravb_close () [3]. And I investigated calltrace, and then the ravb_close() is under rtnl_lock() [4] like below: ----------------------------------------------------------------------- ravb_remove() calls unregister_netdev(). -> unregister_netdev() calls rtnl_lock() and unregister_netdevice(). --> unregiter_netdevice_queue() ---> unregiter_netdevice_many() ----> unregiter_netdevice_many_notify(). -----> dev_close_many() ------> __dev_close_many() -------> ops->ndo_stop() ravb_close() calls phy_stop() -> phy_state_machine() with PHY_HALTED --> phy_link_down() ---> phy_link_change() ----> netif_carrier_off() ----------------------------------------------------------------------- So, during cancel_work_sync() is waiting for canceling the workqueue in ravb_close(), it's under rtnl_lock() so that no additional locks are needed in ravb_tx_timeout_work(). [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3971442870713de527684398416970cf025b4f89 [2] https://lore.kernel.org/netdev/20230727164820.48c9e685@xxxxxxxxxx/ [3] https://lore.kernel.org/netdev/607f4fe4-5a59-39dd-71c2-0cf769b48187@xxxxxx/ [4] https://lore.kernel.org/netdev/OSYPR01MB53341CFDBB49A3BA41A6752CD8F9A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ Best regards, Yoshihiro Shimoda > Best regards > > Dirk