On Thu, Jun 2, 2011 at 2:55 PM, Ying Han <yinghan@xxxxxxxxxx> wrote: > On Tue, May 31, 2011 at 11:25 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote: >> Currently, soft limit reclaim is entered from kswapd, where it selects >> the memcg with the biggest soft limit excess in absolute bytes, and >> reclaims pages from it with maximum aggressiveness (priority 0). >> >> This has the following disadvantages: >> >> 1. because of the aggressiveness, kswapd can be stalled on a memcg >> that is hard to reclaim from for a long time, sending the rest of >> the allocators into direct reclaim in the meantime. >> >> 2. it only considers the biggest offender (in absolute bytes, no >> less, so very unhandy for setups with different-sized memcgs) and >> does not apply any pressure at all on other memcgs in excess. >> >> 3. because it is only invoked from kswapd, the soft limit is >> meaningful during global memory pressure, but it is not taken into >> account during hierarchical target reclaim where it could allow >> prioritizing memcgs as well. So while it does hierarchical >> reclaim once triggered, it is not a truly hierarchical mechanism. >> >> Here is a different approach. Instead of having a soft limit reclaim >> cycle separate from the rest of reclaim, this patch ensures that each >> time a group of memcgs is reclaimed - be it because of global memory >> pressure or because of a hard limit - memcgs that exceed their soft >> limit, or contribute to the soft limit excess of one their parents, >> are reclaimed from at a higher priority than their siblings. >> >> This results in the following: >> >> 1. all relevant memcgs are scanned with increasing priority during >> memory pressure. The primary goal is to free pages, not to punish >> soft limit offenders. >> >> 2. increased pressure is applied to all memcgs in excess of their >> soft limit, not only the biggest offender. >> >> 3. the soft limit becomes meaningful for target reclaim as well, >> where it allows prioritizing children of a hierarchy when the >> parent hits its limit. >> >> 4. direct reclaim now also applies increased soft limit pressure, >> not just kswapd anymore. >> >> Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> >> --- >> include/linux/memcontrol.h | 7 +++++++ >> mm/memcontrol.c | 26 ++++++++++++++++++++++++++ >> mm/vmscan.c | 8 ++++++-- >> 3 files changed, 39 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 8f402b9..7d99e87 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -104,6 +104,7 @@ extern void mem_cgroup_end_migration(struct mem_cgroup *mem, >> struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *, >> struct mem_cgroup *); >> void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *, struct mem_cgroup *); >> +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *, struct mem_cgroup *); >> >> /* >> * For memory reclaim. >> @@ -345,6 +346,12 @@ static inline void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *r, >> { >> } >> >> +static inline bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root, >> + struct mem_cgroup *mem) >> +{ >> + return false; >> +} >> + >> static inline void >> mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p) >> { >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 983efe4..94f77cc3 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -1460,6 +1460,32 @@ void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *root, >> css_put(&mem->css); >> } >> >> +/** >> + * mem_cgroup_soft_limit_exceeded - check if a memcg (hierarchically) >> + * exceeds a soft limit >> + * @root: highest ancestor of @mem to consider >> + * @mem: memcg to check for excess >> + * >> + * The function indicates whether @mem has exceeded its own soft >> + * limit, or contributes to the soft limit excess of one of its >> + * parents in the hierarchy below @root. >> + */ >> +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root, >> + struct mem_cgroup *mem) >> +{ >> + for (;;) { >> + if (mem == root_mem_cgroup) >> + return false; >> + if (res_counter_soft_limit_excess(&mem->res)) >> + return true; >> + if (mem == root) >> + return false; >> + mem = parent_mem_cgroup(mem); >> + if (!mem) >> + return false; >> + } >> +} >> + >> static unsigned long mem_cgroup_reclaim(struct mem_cgroup *mem, >> gfp_t gfp_mask, >> unsigned long flags) >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index c7d4b44..0163840 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1988,9 +1988,13 @@ static void shrink_zone(int priority, struct zone *zone, >> unsigned long reclaimed = sc->nr_reclaimed; >> unsigned long scanned = sc->nr_scanned; >> unsigned long nr_reclaimed; >> + int epriority = priority; >> + >> + if (mem_cgroup_soft_limit_exceeded(root, mem)) >> + epriority -= 1; > > Here we grant the ability to shrink from all the memcgs, but only > higher the priority for those exceed the soft_limit. That is a design > change > for the "soft_limit" which giving a hint to which memcgs to reclaim > from first under global memory pressure. Basically, we shouldn't reclaim from a memcg under its soft_limit unless we have trouble reclaim pages from others. Something like the following makes better sense: diff --git a/mm/vmscan.c b/mm/vmscan.c index bdc2fd3..b82ba8c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1989,6 +1989,8 @@ restart: throttle_vm_writeout(sc->gfp_mask); } +#define MEMCG_SOFTLIMIT_RECLAIM_PRIORITY 2 + static void shrink_zone(int priority, struct zone *zone, struct scan_control *sc) { @@ -2001,13 +2003,13 @@ static void shrink_zone(int priority, struct zone *zone, unsigned long reclaimed = sc->nr_reclaimed; unsigned long scanned = sc->nr_scanned; unsigned long nr_reclaimed; - int epriority = priority; - if (mem_cgroup_soft_limit_exceeded(root, mem)) - epriority -= 1; + if (!mem_cgroup_soft_limit_exceeded(root, mem) && + priority > MEMCG_SOFTLIMIT_RECLAIM_PRIORITY) + continue; sc->mem_cgroup = mem; - do_shrink_zone(epriority, zone, sc); + do_shrink_zone(priority, zone, sc); mem_cgroup_count_reclaim(mem, current_is_kswapd(), mem != root, /* limit or hierarchy? */ sc->nr_scanned - scanned, --Ying > > --Ying > > >> >> sc->mem_cgroup = mem; >> - do_shrink_zone(priority, zone, sc); >> + do_shrink_zone(epriority, zone, sc); >> mem_cgroup_count_reclaim(mem, current_is_kswapd(), >> mem != root, /* limit or hierarchy? */ >> sc->nr_scanned - scanned, >> @@ -2480,7 +2484,7 @@ loop_again: >> * Call soft limit reclaim before calling shrink_zone. >> * For now we ignore the return value >> */ >> - mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask); >> + //mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask); >> >> /* >> * We put equal pressure on every zone, unless >> -- >> 1.7.5.2 >> >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href