Re: [RFC][PATCH 3/3] mm: reserve max drift pages at boot time instead using zone_page_state_snapshot()

KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> · Thu, 14 Oct 2010 11:39:34 +0900 (JST)

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 53627fa..194bdaa 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4897,6 +4897,15 @@ static void setup_per_zone_wmarks(void)
> >  	for_each_zone(zone) {
> >  		u64 tmp;
> >  
> > +		/*
> > +		 * If max drift are less than 1%, reserve max drift pages
> > +		 * instead costly runtime calculation.
> > +		 */
> > +		if (zone->percpu_drift_mark < (zone->present_pages/100)) {
> > +			pages_min += zone->percpu_drift_mark;
> > +			zone->percpu_drift_mark = 0;
> > +		}
> > +
> 
> I don't see how this solves Shaohua's problem as such. Large systems will
> still suffer a bug performance penalty from zone_page_state_snapshot(). I
> do see the logic of adjusting min for larger systems to limit the amount of
> time per-cpu thresholds are lowered but that would be as a follow-on to my
> patch rather than a replacement.

My patch rescue 256cpus or more smaller systems. and I assumed 4096cpus system don't
run IO intensive workload such as Shaohua's case. they always use cpusets and run hpc
workload.

If you know another >1024cpus system, please let me know.
And again, my patch works on 4096cpus sysmtem although slow, but your don't.

Am I missing something?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>