Hello Sergey, > From: Sergey Shtylyov, Sent: Thursday, October 19, 2023 2:27 AM > > On 10/18/23 12:39 PM, Yoshihiro Shimoda wrote: > [...] > >>> Fix races between ravb_tx_timeout_work() and functions of net_device_ops > >>> and ethtool_ops by using rtnl_trylock() and rtnl_unlock(). Note that > >>> since ravb_close() is under the rtnl lock and calls cancel_work_sync(), > >>> ravb_tx_timeout_work() calls rtnl_trylock() to avoid a deadlock. > >> > >> I don't quite follow... how calling cancel_work_sync() is a problem? > >> I thought the problem was that unregister_netdev() can be called with > >> the TX timeout work still pending? And, more generally, shouldn't we > >> protect against the TX timeout work being executed on a different CPU > >> than the {net_device|ethtool}_ops methods are being executed (is that > >> possible?)? > > > > __dev_close_many() in net/core/dev.c calls ASSERT_RTNL() and ops->ndo_stop(). > > So, the ravb_close() is under rtnl lock. While locking the rtnl, it's > > possible to call ravb_tx_timeout_work() on other CPU. In such a case, > > it's possible to cause a deadlock between ravb_close() and ravb_tx_timeout_work() > > > > CPU0 CPU1 > > ravb_tx_timeout() > > schedule_work() > > ... > > __dev_close_many() > > // this is under rtnl lock > > ravb_close() > > cancel_work_sync() > > ravb_tx_timeout_work() > > rtnl_lock() > > // this is possible to cause a deadlock > > Ah, cancel_work_sync() means we have to wait for the work to > finish -- indeed a deadlock is possiblet then. Thank you for your reply. I'll update commit description in v2. > >> I also had a suspicion that we still miss the spinlock calls in > >> ravb_tx_timeout_work() but nothing in particular jumped at me... > > We mainly need to protect against the interrupts in this case... I think so. However, we can not use spin_lock_irqsave() for whole this ravb_tx_timeout_work() because ravb_ring_init() calls kcalloc() with GFP_KERNEL. > >> mind looking into that? :-) > > > > Yes, perhaps we have to check it somehow... > > Unfortunately, I don't seem to have no bandwidth to do that myself... I got it. I'll investigate this later. Best regards, Yoshihiro Shimoda > [...] > > MBR, Sergey