On 12/09/2010 08:17 AM, Tejun Heo wrote:
On 12/09/2010 03:46 PM, Tejun Heo wrote:
Right, so we're flushing here under RTNL ... I believe this is the one
that Ben hacked up to not flush or so?
He made it to cancel instead of flush.
This makes me think that it's more likely to be a problem in the
flush_work() implementation. I went over the code carefully again but
couldn't find anything suspicious. Plus, most of the implementation
is shared between cancel and flush.
I'm gonna write some test code and see whether the flush code behaves
as expected but in the mean time can you please apply the following
patch, trigger the problem and report the kernel log? Also, please
include the sysrq task dump. Let's see whether the worker is always
stuck at the same spot.
I saw a brief hang today, and did a sysrq-t, and then saw the timer
printout you added here. But, I think that was caused by sysrq-t.
The system recovered and ran fine.
The second time (after several hours of rebooting), the hang was worse
and the system ran OOM after maybe 30 seconds. I did a sysrq-t then.
I see quite a few printouts from your debug message, but all of them
after things start going OOM, and after sysrq-t.
Here's the console capture:
http://www.candelatech.com/~greearb/minicom_ath9k_log4.txt
Let me know if you need more traces like this if I hit it again.
Thanks,
Ben
--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html