[RFC PATCH 3/5] rework softlimit reclaim.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The "soft_limit was introduced in memcg to support over-committing the memory
resource on the host. Each cgroup can be configured with "hard_limit", where it
will be throttled or OOM killed by going over the limit. However, the
allocation can go above the "soft_limit" as long as there is no memory
contention.

The current implementation of softlimit reclaim has its disadvantages:
1. It is triggered under global reclaim, and acts as best-effort before the
global LRU scanning.

2. It is based on per-zone RB tree where only the cgroup exceeds the soft_limit
the most being selected for reclaim. In another word, there is no fairness.

3. It takes no consideration of how many pages actually allocated on the zone
from this cgroup.

4. The target of the softlimit reclaim is to bring one cgroup's usage under its
soft_limit, where the global reclaim has different target.

After the "memcg-aware global reclaim" work from Johannes, we have the ability
to have the softlimit reclaim better integrated to the rest of reclaim logics.

Here is how it works now:
1. The soft_limit is integrated into shrink_zone() which is being call from both
targetting and global reclaim. However, we only check soft_limit under global
reclaim.

2. The mem_cgroup_hierarchy_walk() now is integrated inside shrink_zone(). And
the soft_limit works as a filter of which memcgs to reclaim from based on the
reclaim priority.

3. Don't reclaim from a memcg (under its soft_limit) unless the page reclaim is
under trouble. Now we picked DEFAULT_PRIORITY-3 (as mhocko suggested) which
causes scanning on zones(unbalanced) and memcgs(above soft_limit) 3 times before
start looking into other memcgs (under soft_limit).

TODO:
1. The concern is we might end up burning cpu w/o getting much depends how much
low-hanging fruits for the first 3 interation. this is a trade-off of providing
the user expectation of "soft_limit". Runing through more workload and evaluate
the result would be needed.

Signed-off-by: Ying Han <yinghan@xxxxxxxxxx>
---
 include/linux/memcontrol.h |    7 +++++++
 mm/memcontrol.c            |    5 +++++
 mm/vmscan.c                |    4 ++++
 3 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ca5a18d..864c369 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -101,6 +101,7 @@ extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
 struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *,
 					     struct mem_cgroup *);
 void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *, struct mem_cgroup *);
+bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *);
 
 /*
  * For memory reclaim.
@@ -341,6 +342,12 @@ static inline void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *r,
 {
 }
 
+static inline bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root,
+						  struct mem_cgroup *mem)
+{
+	return false;
+}
+
 static inline void
 mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5228039..f2a1892 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1317,6 +1317,11 @@ void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *root,
 		css_put(&mem->css);
 }
 
+bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *mem)
+{
+	return res_counter_soft_limit_excess(&mem->res);
+}
+
 static unsigned long mem_cgroup_reclaim(struct mem_cgroup *mem,
 					gfp_t gfp_mask,
 					unsigned long flags)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d9376d1..85dcdd6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1992,6 +1992,10 @@ static void shrink_zone(int priority, struct zone *zone,
 		unsigned long scanned = sc->nr_scanned;
 		unsigned long nr_reclaimed;
 
+		if (global_reclaim(sc) && priority > DEF_PRIORITY - 3 &&
+			!mem_cgroup_soft_limit_exceeded(mem))
+			continue;
+
 		sc->mem_cgroup = mem;
 		do_shrink_zone(priority, zone, sc);
 		mem_cgroup_count_reclaim(mem, current_is_kswapd(),
-- 
1.7.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]