[PATCH] memcg: fix behavior of shard anon pages at task_move (Was Re: [PATCH v3 2/2] memcg: avoid THP split in task migration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 9 Mar 2012 15:01:09 +0900
Daisuke Nishimura <nishimura@xxxxxxxxxxxxxxxxx> wrote:

> On Fri, 9 Mar 2012 12:24:48 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > > I'd rather delete than add code here!
> > > 
> > 
> > As a user, for Fujitsu, I believe it's insane to move task between cgroups.
> > So, I have no benefit from this code, at all.
> > Ok, maybe I'm not a stakeholder,here.
> > 
> I agree that moving tasks between cgroup is not a sane operation,
> users won't do it so frequently, but I cannot prevent that.
> That's why I implemented this feature.
> 
> > If users say all shared pages should be moved, ok, let's move.
> > But change of behavior should be documented and implemented in an independet
> > patch. CC'ed Nishimura-san, he implemetned this, a real user.
> > 
> To be honest, shared anon is not my concern. My concern is 
> shared memory(that's why, mapcount is not checked as for file pages.
> I assume all processes sharing the same shared memory will be moved together).
> So, it's all right for me to change the behavior for shared anon(or leave
> it as it is).
> 

Thank you for comment. Then, here is a patch.

Other opinions ?

==
>From 1012e97e3b123fb80d0ec6b1f5d3dbc87a5a5139 Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
Date: Fri, 9 Mar 2012 16:22:32 +0900
Subject: [PATCH] memcg: fix/change behavior of shared anon at moving task.

In documentation, it's said that 'shared anon are not moved'.
But in implementation, this check was done.

  if (!move_anon() || page_mapcount(page) > 2)

Ah, memcg has been moving shared anon pages for a long time.

Then, here is a discussion about handling of shared anon pages.

 - It's complex
 - Now, shared file caches are moved in force.
 - It adds unclear check as page_mapcount(). To do correct check,
   we should check swap users, etc.
 - No one notice this implementation behavior. So, no one get benefit
   from the design.
 - In general, once task is moved to a cgroup for running, it will not
   be moved....
 - Finally, we have control knob as memory.move_charge_at_immigrate.


Here is a patch to allow moving shared pages. This makes memcg simpler
and fix current broken implementation.
I added a notice for what happens at fork() -> move -> exec() usage.

Note:
 IIUC, libcgroup's cgroup daemon moves tasks after exec().
 So, it's not affected. 
 libcgroup's command "cgexec" does move itsef to a memcg and call exec()
 without fork(). it's not affected.

Suggested-by: Hugh Dickins <hughd@xxxxxxxxxx>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
---
 Documentation/cgroups/memory.txt |   10 ++++++++--
 include/linux/swap.h             |    9 ---------
 mm/memcontrol.c                  |   15 +++------------
 mm/swapfile.c                    |   31 -------------------------------
 4 files changed, 11 insertions(+), 54 deletions(-)

diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index 4c95c00..16bc9f2 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -185,6 +185,9 @@ behind this approach is that a cgroup that aggressively uses a shared
 page will eventually get charged for it (once it is uncharged from
 the cgroup that brought it in -- this will happen on memory pressure).
 
+(*)See section 8.2. At task moving, you can recharge mapped pages to other
+   cgroup.
+
 Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used.
 When you do swapoff and make swapped-out pages of shmem(tmpfs) to
 be backed into memory in force, charges for pages are accounted against the
@@ -623,8 +626,8 @@ memory cgroup.
   bit | what type of charges would be moved ?
  -----+------------------------------------------------------------------------
    0  | A charge of an anonymous page(or swap of it) used by the target task.
-      | Those pages and swaps must be used only by the target task. You must
-      | enable Swap Extension(see 2.4) to enable move of swap charges.
+      | Even if it's shared, it will be moved in force(*). You must enable Swap
+      | Extension(see 2.4) to enable move of swap charges.
  -----+------------------------------------------------------------------------
    1  | A charge of file pages(normal file, tmpfs file(e.g. ipc shared memory)
       | and swaps of tmpfs file) mmapped by the target task. Unlike the case of
@@ -635,6 +638,9 @@ memory cgroup.
       | page_mapcount(page) > 1). You must enable Swap Extension(see 2.4) to
       | enable move of swap charges.
 
+(*) Because of this, fork() -> move -> exec() will move all parent's page
+    to the target cgroup. Please be careful.
+
 8.3 TODO
 
 - Implement madvise(2) to let users decide the vma to be moved or not to be
diff --git a/include/linux/swap.h b/include/linux/swap.h
index f7df3ea..13c8d6f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -390,7 +390,6 @@ static inline void deactivate_swap_token(struct mm_struct *mm, bool swap_token)
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 extern void
 mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout);
-extern int mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep);
 #else
 static inline void
 mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
@@ -535,14 +534,6 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent)
 {
 }
 
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR
-static inline int
-mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep)
-{
-	return 0;
-}
-#endif
-
 #endif /* CONFIG_SWAP */
 #endif /* __KERNEL__*/
 #endif /* _LINUX_SWAP_H */
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c83aeb5..e7e4e3d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5100,12 +5100,9 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
 
 	if (!page || !page_mapped(page))
 		return NULL;
-	if (PageAnon(page)) {
-		/* we don't move shared anon */
-		if (!move_anon() || page_mapcount(page) > 2)
-			return NULL;
+	if (PageAnon(page) && !move_anon()) {
+		return NULL;
 	} else if (!move_file())
-		/* we ignore mapcount for file pages */
 		return NULL;
 	if (!get_page_unless_zero(page))
 		return NULL;
@@ -5116,18 +5113,12 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
 static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
 			unsigned long addr, pte_t ptent, swp_entry_t *entry)
 {
-	int usage_count;
 	struct page *page = NULL;
 	swp_entry_t ent = pte_to_swp_entry(ptent);
 
 	if (!move_anon() || non_swap_entry(ent))
 		return NULL;
-	usage_count = mem_cgroup_count_swap_user(ent, &page);
-	if (usage_count > 1) { /* we don't move shared anon */
-		if (page)
-			put_page(page);
-		return NULL;
-	}
+	page = lookup_swap_cache(ent);
 	if (do_swap_account)
 		entry->val = ent.val;
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index fa3c519..85b4548 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -720,37 +720,6 @@ int free_swap_and_cache(swp_entry_t entry)
 	return p != NULL;
 }
 
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR
-/**
- * mem_cgroup_count_swap_user - count the user of a swap entry
- * @ent: the swap entry to be checked
- * @pagep: the pointer for the swap cache page of the entry to be stored
- *
- * Returns the number of the user of the swap entry. The number is valid only
- * for swaps of anonymous pages.
- * If the entry is found on swap cache, the page is stored to pagep with
- * refcount of it being incremented.
- */
-int mem_cgroup_count_swap_user(swp_entry_t ent, struct page **pagep)
-{
-	struct page *page;
-	struct swap_info_struct *p;
-	int count = 0;
-
-	page = find_get_page(&swapper_space, ent.val);
-	if (page)
-		count += page_mapcount(page);
-	p = swap_info_get(ent);
-	if (p) {
-		count += swap_count(p->swap_map[swp_offset(ent)]);
-		spin_unlock(&swap_lock);
-	}
-
-	*pagep = page;
-	return count;
-}
-#endif
-
 #ifdef CONFIG_HIBERNATION
 /*
  * Find the swap type that corresponds to given device (if any).
-- 
1.7.4.1











--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]