On Sun, May 15, 2011 at 11:27 AM, Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote: > On Sun, May 15, 2011 at 09:37:58AM +0800, Minchan Kim wrote: >> On Sun, May 15, 2011 at 2:43 AM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote: >> > Copying back linux-mm. >> > >> >> Recently, we added following patch. >> >> https://lkml.org/lkml/2011/4/26/129 >> >> If it's a culprit, the patch should solve the problem. >> > >> > It would be probably better to not do the allocations at all under >> > memory pressure. Even if the RA allocation doesn't go into reclaim >> >> Fair enough. >> I think we can do it easily now. >> If page_cache_alloc_readahead(ie, GFP_NORETRY) is fail, we can adjust >> RA window size or turn off a while. The point is that we can use the >> fail of __do_page_cache_readahead as sign of memory pressure. >> Wu, What do you think? > > No, disabling readahead can hardly help. > > The sequential readahead memory consumption can be estimated by > > 2 * (number of concurrent read streams) * (readahead window size) > > And you can double that when there are two level of readaheads. > > Since there are hardly any concurrent read streams in Andy's case, > the readahead memory consumption will be ignorable. > > Typically readahead thrashing will happen long before excessive > GFP_NORETRY failures, so the reasonable solutions are to > > - shrink readahead window on readahead thrashing > (current readahead heuristic can somehow do this, and I have patches > to further improve it) > > - prevent abnormal GFP_NORETRY failures > (when there are many reclaimable pages) > > > Andy's OOM memory dump (incorrect_oom_kill.txt.xz) shows that there are > > - 8MB active+inactive file pages > - 160MB active+inactive anon pages > - 1GB shmem pages > - 1.4GB unevictable pages > > Hmm, why are there so many unevictable pages? How come the shmem > pages become unevictable when there are plenty of swap space? I have no clue, but this patch (from Minchan, whitespace-damaged) seems to help: diff --git a/mm/vmscan.c b/mm/vmscan.c index f6b435c..4d24828 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2251,6 +2251,10 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, unsigned long balanced = 0; bool all_zones_ok = true; + /* If kswapd has been running too long, just sleep */ + if (need_resched()) + return false; + /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ if (remaining) return true; @@ -2286,7 +2290,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, * must be balanced */ if (order) - return pgdat_balanced(pgdat, balanced, classzone_idx); + return !pgdat_balanced(pgdat, balanced, classzone_idx); else return !all_zones_ok; } I haven't tested it very thoroughly, but it's survived much longer than an unpatched kernel probably would have under moderate use. I have no idea what the patch does :) I'm happy to run any tests. I'm also planning to upgrade from 2GB to 8GB RAM soon, which might change something. --Andy -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href