Re: ath5k/mac80211: Reproducible deadlock with 64-stations.

Tejun Heo <tj@xxxxxxxxxx> · Fri, 12 Nov 2010 11:11:09 +0100

Hello,

On 11/12/2010 12:12 AM, Ben Greear wrote:
> The lockup (or extreme slowdown?) happens before the
> serious memory pressure.
> 
> One thing I noticed is that at one point near (at?) the beginning
> of the slowdown, it took 36-seconds to complete the
> flush_work() call in ieee80211_do_stop in iface.c
> 
> From some printk's I added:
> 
> Nov 11 14:58:13 localhost kernel: do_stop: sta14 flushing work: e51298b4
> Nov 11 14:58:49 localhost kernel: do_stop: sta14 flushed.
> 
> It is holding RTNL for this entire time, which of course stops
> a large number of other useful processes from making
> progress.
> 
> Is there any good reason for the flush to take so long?

It depends on what the work being flushed was doing.  Which one is it
trying to flush?  Also, if the memory pressure is high enough, due to
the dynamic nature of workqueue, processing of works can be delayed
while trying to create new workers to process them.  Situations like
that usually don't happen often as it's likely that workers get freed
up as other works finish; however, if workers are piling up on
rtnl_lock, there really isn't much it can do.  If there's work user
which can behave like that, it would be a good idea to restrict its
maximum concurrency using a separate workqueue.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html