On 28.12.2012 03:49, Minchan Kim wrote:
Hello Zlatko,
On Fri, Dec 28, 2012 at 03:16:38AM +0100, Zlatko Calusic wrote:
From: Zlatko Calusic <zlatko.calusic@xxxxxxxx>
The unintended consequence of commit 4ae0a48b is that
wait_iff_congested() can now be called with NULL struct zone*
producing kernel oops like this:
For good description, it would be better to write simple pseudo code
flow to show how NULL-zone pass into wait_iff_congested because
kswapd code flow is too complex.
As I see the code, we have following line above wait_iff_congested.
if (!unbalanced_zone || blah blah)
break;
How can NULL unbalanced_zone reach wait_iff_congested?
Hello Minchan, and thanks for the comment.
That line was there before commit 4ae0a48b got in, and you're right,
it's what was protecting wait_iff_congested() from being called with
NULL zone*. But then all that logic got colapsed to a simple
pgdat_balanced() call and that's when I introduced the bug, I lost the
protection.
What I _think_ is happening (pseudo code following...) is that after
scanning the zone in the dma->highmem direction, and concluding that all
zones are balanced (unbalanced_zone remains NULL!),
wake_up(&pgdat->pfmemalloc_wait) wakes up a lot of memory hungry
processes (especially true in various aggressive test/benchmarks) that
immediately drain and unbalance one or more zones. Then pgdat_balanced()
call which immediately follows will be false, but we still have
unbalanced_zone = NULL, rememeber? Oops...
But, all that is a speculation that I can't prove atm. Of course, if
anybody thinks that's a credible explanation, I could add it as a commit
comment, or even as a code comment, but I didn't want to be overly
imaginative. The fix itself is simple and real.
Regards,
--
Zlatko
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>