On 23/06/11 12:46, Mel Gorman wrote: > Based on the information you have provided from sysrq and the profile, > I put together a theory as to what is going wrong for your machine at > least although I somehow doubt the same fix will work for Dan. Can you > try out the following please? It's against 2.6.38.8 (and presumably > Fedora) but will apply with offset against 2.6.39 and 3.0-rc4. > > ==== CUT HERE ==== > mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely > > During allocator-intensive workloads, kswapd will be woken frequently > causing free memory to oscillate between the high and min watermark. > This is expected behaviour. > > A problem occurs if the highest zone is small that keeps kswapd awake. > balance_pgdat() only considers unreclaimable zones when priority > is DEF_PRIORITY but sleeping_prematurely considers all zones. It's > possible for this sequence to occur > > 1. kswapd wakes up and enters balance_pgdat() > 2. At DEF_PRIORITY, marks highest zone unreclaimable > 3. At DEF_PRIORITY-1, ignores highest zone setting end_zone > 4. At DEF_PRIORITY-1, calls shrink_slab freeing memory from > highest zone, clearing all_unreclaimable. Highest zone > is still unbalanced > 5. kswapd returns and calls sleeping_prematurely before sleep > 6. sleeping_prematurely looks at *all* zones, not just the ones > being considered by balance_pgdat. The highest small zone > has all_unreclaimable cleared but the zone is not > balanced. all_zones_ok is false so kswapd stays awake > > The impact is that kswapd chews up a lot of CPU as it avoids most of > the scheduling points and reclaims excessively from the lower zones. > This patch corrects the behaviour of sleeping_prematurely to check > the zones balance_pgdat() checked. > > Reported-by: Pádraig Brady <P@xxxxxxxxxxxxxx> > Not-signed-off-awaiting-confirmation: Mel Gorman <mgorman@xxxxxxx> > --- > mm/vmscan.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a74bf72..a578535 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2261,7 +2261,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, > return true; > > /* Check the watermark levels */ > - for (i = 0; i < pgdat->nr_zones; i++) { > + for (i = 0; i <= classzone_idx; i++) { > struct zone *zone = pgdat->node_zones + i; > > if (!populated_zone(zone)) No joy :( cheers, Pádraig. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>