On Mon, 16 Oct 2006 15:50:55 +0100 (BST) "Maciej W. Rozycki" <macro@xxxxxxxxxxxxxx> wrote: > Andrew, > > > I don't get it. If some code does > > > > rtnl_lock(); > > flush_scheduled_work(); > > > > and there's some work scheduled which does rtnl_lock() then it'll deadlock. > > > > But it'll deadlock whether or not the caller of flush_scheduled_work() is > > keventd. > > > > Calling flush_scheduled_work() under locks is generally a bad idea. > > Indeed -- this is why I avoid it. > > > I'd have thought that was still deadlockable. Could you describe the > > deadlock more completely please? > > The simplest sequence of calls that prevents races here is as follows: > > unregister_netdev(); > rtnl_lock(); > unregister_netdevice(); > dev_close(); > sbmac_close(); > phy_stop(); > phy_disconnect(); > phy_stop_interrupts(); > phy_disable_interrupts(); > flush_scheduled_work(); > free_irq(); > phy_detach(); > mdiobus_unregister(); > rtnl_unlock(); > > We want to call flush_scheduled_work() from phy_stop_interrupts(), because > there may still be calls to phy_change() waiting for the keventd to > process and mdiobus_unregister() frees structures needed by phy_change(). > This may deadlock because of the call to rtnl_lock() though. > > So the modified sequence I have implemented is as follows: > > unregister_netdev(); > rtnl_lock(); > unregister_netdevice(); > dev_close(); > sbmac_close(); > phy_stop(); > schedule_work(); [sbmac_phy_disconnect()] > rtnl_unlock(); > > and then: > > sbmac_phy_disconnect(); > phy_disconnect(); > phy_stop_interrupts(); > phy_disable_interrupts(); > free_irq(); > phy_detach(); > mdiobus_unregister(); > > This guarantees calls to phy_change() for this PHY unit will not be run > after mdiobus_unregister(), because any such calls have been placed in the > queue before sbmac_phy_disconnect() (phy_stop() prevents the interrupt > handler from issuing new calls to phy_change()). > > We still need flush_scheduled_work() to be called from > phy_stop_interrupts() though, in case some other MAC driver calls > phy_disconnect() (or phy_stop_interrupts(), depending on the abstraction > layer of phylib used) directly rather than using keventd. This is > possible if phy_disconnect() is called from the driver's module_exit() > call, which may make sense for devices that are known not to have their > MII interface available as an external connector. Hence the: > > if (!current_is_keventd()) > flush_scheduled_work(); > > sequence in phy_stop_interrupts(). Of course, we can force all drivers > using phylib (in the interrupt mode) to call phy_disconnect() through > keventd instead. > > Does it sound clearer? > Vaguely. Why doesn't it deadlock if !current_is_keventd()? I mean, whether or not the caller is keventd, the flush_scheduled_work() caller will still be dependent upon rtnl_lock() being acquirable.