On Tue, 6 Dec 2011 15:59:57 -0800 Ying Han <yinghan@xxxxxxxxxx> wrote: > Under the shrink_zone, we examine whether or not to reclaim from a memcg > based on its softlimit. We skip scanning the memcg for the first 3 priority. > This is to balance between isolation and efficiency. we don't want to halt > the system by skipping memcgs with low-hanging fruits forever. > > Another change is to set soft_limit_in_bytes to 0 by default. This is needed > for both functional and performance: > > 1. If soft_limit are all set to MAX, it wastes first three periority iterations > without scanning anything. > > 2. By default every memcg is eligibal for softlimit reclaim, and we can also > set the value to MAX for special memcg which is immune to soft limit reclaim. > Could you update softlimit doc ? > Signed-off-by: Ying Han <yinghan@xxxxxxxxxx> > --- > include/linux/memcontrol.h | 7 ++++ > kernel/res_counter.c | 1 - > mm/memcontrol.c | 8 +++++ > mm/vmscan.c | 67 ++++++++++++++++++++++++++----------------- > 4 files changed, 55 insertions(+), 28 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 81aabfb..53d483b 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -107,6 +107,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *, > struct mem_cgroup_reclaim_cookie *); > void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *); > > +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *); > + > /* > * For memory reclaim. > */ > @@ -293,6 +295,11 @@ static inline void mem_cgroup_iter_break(struct mem_cgroup *root, > { > } > > +static inline bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *mem) > +{ > + return true; > +} > + > static inline int mem_cgroup_get_reclaim_priority(struct mem_cgroup *memcg) > { > return 0; > diff --git a/kernel/res_counter.c b/kernel/res_counter.c > index b814d6c..92afdc1 100644 > --- a/kernel/res_counter.c > +++ b/kernel/res_counter.c > @@ -18,7 +18,6 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent) > { > spin_lock_init(&counter->lock); > counter->limit = RESOURCE_MAX; > - counter->soft_limit = RESOURCE_MAX; > counter->parent = parent; > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4425f62..7c6cade 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -926,6 +926,14 @@ out: > } > EXPORT_SYMBOL(mem_cgroup_count_vm_event); > > +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *mem) > +{ > + if (mem_cgroup_disabled() || mem_cgroup_is_root(mem)) > + return true; > + > + return res_counter_soft_limit_excess(&mem->res) > 0; > +} > + > /** > * mem_cgroup_zone_lruvec - get the lru list vector for a zone and memcg > * @zone: zone of the wanted lruvec > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 0ba7d35..b36d91b 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2091,6 +2091,17 @@ restart: > throttle_vm_writeout(sc->gfp_mask); > } > > +static bool should_reclaim_mem_cgroup(struct scan_control *sc, > + struct mem_cgroup *mem, > + int priority) > +{ > + if (!global_reclaim(sc) || priority <= DEF_PRIORITY - 3 || > + mem_cgroup_soft_limit_exceeded(mem)) > + return true; > + > + return false; > +} > + Why "priority <= DEF_PRIORTY - 3" is selected ? It seems there is no reason. Could you justify this check ? Thinking briefly, can't we caluculate the ratio as number of pages in reclaimable memcg / number of reclaimable pages And use 'priorty' ? If total_reclaimable_pages >> priority > number of pages in reclaimabe memcg memcg under softlimit should be scanned..then, we can avoid scanning pages twice. Hmm, please give reason of the magic value here, anyway. > static void shrink_zone(int priority, struct zone *zone, > struct scan_control *sc) > { > @@ -2108,7 +2119,9 @@ static void shrink_zone(int priority, struct zone *zone, > .zone = zone, > }; > > - shrink_mem_cgroup_zone(priority, &mz, sc); > + if (should_reclaim_mem_cgroup(sc, memcg, priority)) > + shrink_mem_cgroup_zone(priority, &mz, sc); > + > /* > * Limit reclaim has historically picked one memcg and > * scanned it with decreasing priority levels until > @@ -2152,8 +2165,8 @@ static bool shrink_zones(int priority, struct zonelist *zonelist, > { > struct zoneref *z; > struct zone *zone; > - unsigned long nr_soft_reclaimed; > - unsigned long nr_soft_scanned; > +// unsigned long nr_soft_reclaimed; > +// unsigned long nr_soft_scanned; Why do you leave these things ? Hmm, but the whole logic seems clean to me except for magic number. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>