Re: zone state overhead

Shaohua Li <shaohua.li@xxxxxxxxx> · Wed, 29 Sep 2010 12:47:18 +0800



On Wed, 2010-09-29 at 12:02 +0800, David Rientjes wrote:
> On Tue, 28 Sep 2010, Mel Gorman wrote:
> 
> > This is true. It's helpful to remember why this patch exists. Under heavy
> > memory pressure, large machines run the risk of live-locking because the
> > NR_FREE_PAGES gets out of sync. The test case mentioned above is under
> > memory pressure so it is potentially at risk. Ordinarily, we would be less
> > concerned with performance under heavy memory pressure and more concerned with
> > correctness of behaviour. The percpu_drift_mark is set at a point where the
> > risk is "real".  Lowering it will help performance but increase risk. Reducing
> > stat_threshold shifts the cost elsewhere by increasing the frequency the
> > vmstat counters are updated which I considered to be worse overall.
> > 
> > Which of these is better or is there an alternative suggestion on how
> > this livelock can be avoided?
> > 
> 
> I don't think the risk is quite real based on the calculation of 
> percpu_drift_mark using the high watermark instead of the min watermark.  
> For Shaohua's 64 cpu system:
> 
> Node 3, zone   Normal
> pages free     2055926
>         min      1441
>         low      1801
>         high     2161
>         scanned  0
>         spanned  2097152
>         present  2068480
>   vm stats threshold: 98
> 
> It's possible that we'll be 98 pages/cpu * 64 cpus = 6272 pages off in the 
> NR_FREE_PAGES accounting at any given time.  So to avoid depleting memory 
> reserves at the min watermark, which is livelock, and unnecessarily 
> spending time doing reclaim, percpu_drift_mark should be
> 1801 + 6272 = 8073 pages.  Instead, we're currently using the high 
> watermark, so percpu_drift_mark is 8433 pages.
> 
> It's plausible that we never reclaim sufficient memory that we ever get 
> above the high watermark since we only trigger reclaim when we can't 
> allocate above low, so we may be stuck calling zone_page_state_snapshot() 
> constantly.
> 
> I'd be interested to see if this patch helps.
> ---
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -154,7 +154,7 @@ static void refresh_zone_stat_thresholds(void)
>  		tolerate_drift = low_wmark_pages(zone) - min_wmark_pages(zone);
>  		max_drift = num_online_cpus() * threshold;
>  		if (max_drift > tolerate_drift)
> -			zone->percpu_drift_mark = high_wmark_pages(zone) +
> +			zone->percpu_drift_mark = low_wmark_pages(zone) +
>  					max_drift;
>  	}
>  }
I'm afraid not. I tried Christoph's patch, which doesn't help.
in that patch, the threshold = 6272/2 = 3136. and the percpu_drift_mark
is 3136 + 2161 < 8073

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>