Re: [patch 3/6] 2.6.18: sb1250-mac: Phylib IRQ handling fixes

"Maciej W. Rozycki" <macro@xxxxxxxxxxxxxx> · Mon, 16 Oct 2006 15:50:55 +0100 (BST)

Andrew,

> I don't get it.  If some code does
> 
> 	rtnl_lock();
> 	flush_scheduled_work();
> 
> and there's some work scheduled which does rtnl_lock() then it'll deadlock.
> 
> But it'll deadlock whether or not the caller of flush_scheduled_work() is
> keventd.
> 
> Calling flush_scheduled_work() under locks is generally a bad idea.

 Indeed -- this is why I avoid it.

> I'd have thought that was still deadlockable.  Could you describe the
> deadlock more completely please?

 The simplest sequence of calls that prevents races here is as follows:

unregister_netdev();
  rtnl_lock();
  unregister_netdevice();
    dev_close();
      sbmac_close();
        phy_stop();
        phy_disconnect();
          phy_stop_interrupts();
            phy_disable_interrupts();
            flush_scheduled_work();
            free_irq();
          phy_detach();
        mdiobus_unregister();
  rtnl_unlock();

We want to call flush_scheduled_work() from phy_stop_interrupts(), because 
there may still be calls to phy_change() waiting for the keventd to 
process and mdiobus_unregister() frees structures needed by phy_change().  
This may deadlock because of the call to rtnl_lock() though.

 So the modified sequence I have implemented is as follows:

unregister_netdev();
  rtnl_lock();
  unregister_netdevice();
    dev_close();
      sbmac_close();
        phy_stop();
        schedule_work(); [sbmac_phy_disconnect()]
  rtnl_unlock();

and then:

sbmac_phy_disconnect();
  phy_disconnect();
    phy_stop_interrupts();
      phy_disable_interrupts();
      free_irq();
    phy_detach();
  mdiobus_unregister();

This guarantees calls to phy_change() for this PHY unit will not be run 
after mdiobus_unregister(), because any such calls have been placed in the 
queue before sbmac_phy_disconnect() (phy_stop() prevents the interrupt 
handler from issuing new calls to phy_change()).

 We still need flush_scheduled_work() to be called from 
phy_stop_interrupts() though, in case some other MAC driver calls 
phy_disconnect() (or phy_stop_interrupts(), depending on the abstraction 
layer of phylib used) directly rather than using keventd.  This is 
possible if phy_disconnect() is called from the driver's module_exit() 
call, which may make sense for devices that are known not to have their 
MII interface available as an external connector.  Hence the:

if (!current_is_keventd())
  flush_scheduled_work();

sequence in phy_stop_interrupts().  Of course, we can force all drivers 
using phylib (in the interrupt mode) to call phy_disconnect() through 
keventd instead.

 Does it sound clearer?

  Maciej