Hello, On 11/16/2010 05:51 PM, Ben Greear wrote: >> 1. Try to capture the full dump. Usually serial console works best. > > This was from serial console, and I grabbed everything it printed > to the screen. I'll look in /var/log/messages in case there is more > there. Yeah, weird. It doesn't look like it's missing random lines but it definitely doesn't contain all the tasks. > If you have a system with an ath5k nic, I should be able to show you > how to reproduce it, if you're interested. Unfortunately, I don't have ath5k. I can order one if it's not too expensive. Anything you can recommend? >> 2. Does adding WQ_MEM_RECLAIM to alloc_ordered_workqueue() call in >> ieee80211_register_hw() make any difference? >> >> 3. What if you replace it with the following? >> >> alloc_workqueue(wiphy_name(local->hw.wiphy), WQ_NON_REENTRANT, 0) > > I can try these things..hopefully today. > > Can you explain briefly how this is supposed to work? I'm certain that some > workers can be blocked attempting to get rtnl. When we call flush_work(), > how is a worker chosen/created to flush that work? They might not be solutions themselves but they should point where the problem is. flush_work() only flushes the target work. It waits the currently pending or executing work to finish execution. Ordered workqueue can execute only single work at any given time, so if another work is taking a long time to finish, everything queued to the workqueue will be delayed. This is why I asked for the full dump so that we can find out who's holding the queue. The other reason a work execution can be delayed is if there is no execution resource available due to high memory pressure. This again will be distinguishible from task dump as rescue workers would be active and manager worker would be in worker creation path. The two suggested changes modify the workqueue behavior such that each resolves one of the two issues. If you set WQ_MEM_RECLAIM, workqueue allocates a dedicated worker to use under memory pressure, so execution resource is guaranteed to be there. If you use WQ_NON_REENTRANT, workqueue would execute multiple works in parallel and a single work which takes a long time to finish won't delay other works queued to the same workqueue. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html