Search Linux Wireless

Re: [PATCH] mac80211: Fix deadlock in ieee80211_do_stop.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/17/2010 12:55 AM, Tejun Heo wrote:
Hello,

On 11/16/2010 05:51 PM, Ben Greear wrote:
1. Try to capture the full dump.  Usually serial console works best.

This was from serial console, and I grabbed everything it printed
to the screen.  I'll look in /var/log/messages in case there is more
there.

Yeah, weird.  It doesn't look like it's missing random lines but it
definitely doesn't contain all the tasks.

If you have a system with an ath5k nic, I should be able to show you
how to reproduce it, if you're interested.

Unfortunately, I don't have ath5k.  I can order one if it's not too
expensive.  Anything you can recommend?

2. Does adding WQ_MEM_RECLAIM to alloc_ordered_workqueue() call in
     ieee80211_register_hw() make any difference?

3. What if you replace it with the following?

     alloc_workqueue(wiphy_name(local->hw.wiphy), WQ_NON_REENTRANT, 0)

I can try these things..hopefully today.

Can you explain briefly how this is supposed to work?  I'm certain that some
workers can be blocked attempting to get rtnl.  When we call flush_work(),
how is a worker chosen/created to flush that work?

They might not be solutions themselves but they should point where the
problem is.  flush_work() only flushes the target work.  It waits the
currently pending or executing work to finish execution.  Ordered
workqueue can execute only single work at any given time, so if
another work is taking a long time to finish, everything queued to the
workqueue will be delayed.  This is why I asked for the full dump so

Well, from the lockdep and stack traces, we can be certain at least one
of the workers is blocked trying to lock RTNL.  That worker is certainly
blocked and will never finish until the flush_work() completes since the
flush_work() caller already owns RTNL.  If
the flush-work() is waiting on that worker to finish, then it's
a deadlock.

that we can find out who's holding the queue.  The other reason a work
execution can be delayed is if there is no execution resource
available due to high memory pressure.  This again will be
distinguishible from task dump as rescue workers would be active and
manager worker would be in worker creation path.

I have plenty of memory available when this problem starts.
(I doubled memory to 2GB of low-memory and the problem persists.)

The two suggested changes modify the workqueue behavior such that each
resolves one of the two issues.  If you set WQ_MEM_RECLAIM, workqueue
allocates a dedicated worker to use under memory pressure, so
execution resource is guaranteed to be there.  If you use
WQ_NON_REENTRANT, workqueue would execute multiple works in parallel
and a single work which takes a long time to finish won't delay other
works queued to the same workqueue.

From your description, REENTRANT appears that it could fix the problem.
Johannes:  Any idea if that would be proper behaviour for this work-queue,
or would that add out-of-order and locking issues of it's own?

Thanks,
Ben


--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]
  Powered by Linux