Hello, Johannes. On 12/09/2010 03:42 PM, Johannes Berg wrote: > On Thu, 2010-12-09 at 15:34 +0100, Tejun Heo wrote: > >> [<78447ce1>] flush_work+0x23/0x27 >> [<f91a2646>] ieee80211_do_stop+0x25c/0x403 [mac80211] > >> [<787001fe>] rtnetlink_rcv+0x1b/0x22 <- rtnl lock > > Right, so we're flushing here under RTNL ... I believe this is the one > that Ben hacked up to not flush or so? He made it to cancel instead of flush. >> [<7878cdab>] _cond_resched+0x2b/0x44 >> [<7878d84f>] mutex_lock_nested+0x22/0x3b >> [<f919fddc>] ieee80211_sta_rx_queued_mgmt+0x2d/0x3a6 [mac80211] >> [<f91a2f53>] ieee80211_iface_work+0x1ff/0x282 [mac80211] > >> But, sdata->work is busy running ieee80211_iface_work(). I _suspect_ >> for some reason iee80211_iface_work() isn't finishing. > > It's trying to acquire a mutex here, which must be &ifmgd->mtx or > &local->mtx, but neither of them ever nest around the RTNL. Yeah, but the task state is 'R' not 'D' and no one else is holding the lock. It seems more like ieee80211_iface_work() is looping constantly. >> That, or, the new flush_work() implementation is broken and it's >> failing to flush when a work is being executed back to back. I'll >> prep a debug patch to determine what's going on. > > Thanks. > > I wonder if Ben can attempt to reproduce this using compat-wireless > against a kernel that doesn't have the workqueue changes, was the last > one without those 2.6.34? 2.6.35? As I think we're now pretty close to where the problem is, I'd like to try a few things before going that path. >> The rest of the system going down the toilet after this is mostly >> caused by the held rtnl_lock above. > > Indeed, the rtnl is pretty important :-) Heh, yeah, it's one of the most widely used mutex. It's scary. :-) -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html