Re: [PATCH] mm: fix negative nr_isolated counts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/11/2015 10:09 PM, Andrew Morton wrote:
On Tue, 10 Feb 2015 23:06:09 -0800 (PST) Hugh Dickins <hughd@xxxxxxxxxx> wrote:

The vmstat interfaces are good at hiding negative counts (at least
when CONFIG_SMP); but if you peer behind the curtain, you find that
nr_isolated_anon and nr_isolated_file soon go negative, and grow ever
more negative: so they can absorb larger and larger numbers of isolated
pages, yet still appear to be zero.

I'm happy to avoid a congestion_wait() when too_many_isolated() myself;
but I guess it's there for a good reason, in which case we ought to get
too_many_isolated() working again.

The imbalance comes from isolate_migratepages()'s ISOLATE_ABORT case:
putback_movable_pages() decrements the NR_ISOLATED counts, but we forgot
to call acct_isolated() to increment them.

So if I'm understanding this correctly, shrink_inactive_list()'s call
to congestion_wait() basically never happens?

I think so, the more the counters go negative, the less chance of congestion_wait() to happen from there.

If so I'm pretty reluctant to merge this up until it has had plenty of
careful testing - there's a decent chance that it will make the kernel
behave worse.

You mean "worse" by letting shrink_inactive_list() call congestion_wait() again, as it used to before 3.18, since 2009 it seems? Maybe it's not needed anymore, but it IMHO shouldn't get disabled by accident, but properly evaluated and removed. Hugh's patch just fixes the accidental disable.

Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # v3.18+

And why -stable?  What user-visible problem is the bug causing?


Commit 35cd78156c "vmscan: throttle direct reclaim when too many pages are isolated already" by Rik seems to have introduced this congestion_wait() based on too_many_isolated(). The bug it was fixing:

"When way too many processes go into direct reclaim, it is possible for all of the pages to be taken off the LRU. One result of this is that the next process in the page reclaim code thinks there are no reclaimable pages left and triggers an out of memory kill."

So either this is now prevented by something else and too_many_isolated() could go away, or we should restore its functionality. Any idea, Rik?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]