Re: memcg: add mlock statistic in memory.stat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 11 Jan 2012 15:17:42 -0800 (PST)
Hugh Dickins <hughd@xxxxxxxxxx> wrote:

> On Wed, 11 Jan 2012, Ying Han wrote:
> 
> > We have the nr_mlock stat both in meminfo as well as vmstat system wide, this
> > patch adds the mlock field into per-memcg memory stat. The stat itself enhances
> > the metrics exported by memcg, especially is used together with "uneivctable"
> > lru stat.
> > 
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -10,6 +10,7 @@ enum {
> >  	/* flags for mem_cgroup and file and I/O status */
> >  	PCG_MOVE_LOCK, /* For race between move_account v.s. following bits */
> >  	PCG_FILE_MAPPED, /* page is accounted as "mapped" */
> > +	PCG_MLOCK, /* page is accounted as "mlock" */
> >  	/* No lock in page_cgroup */
> >  	PCG_ACCT_LRU, /* page has been accounted for (under lru_lock) */
> >  	__NR_PCG_FLAGS,
> 
> Is this really necessary?  KAMEZAWA-san is engaged in trying to reduce
> the number of PageCgroup flags, and I expect that in due course we shall
> want to merge them in with Page flags, so adding more is unwelcome.
> I'd  have thought that with memcg_ hooks in the right places,
> a separate flag would not be necessary?
> 

Please don't ;)

NR_UNEIVCTABLE_LRU is not enough ?

Following is the patch I posted before to remove PCG_FILE_MAPPED.
Then, I think you can use similar logic and make use of UNEVICTABLE flags.

==
better (lockless) idea is welcomed.

>From fd2b5822838eebbacc41f343f9eb8c6f0ad8e1cc Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
Date: Thu, 15 Dec 2011 11:42:49 +0900
Subject: [PATCH 2/5] memcg: safer page stat updating

Now, page stat accounting is done like this.

     if (....set flag or some)
            update vmstat
            update memcg'stat

Unlike vmstat, memcg must take care of changes in pc->mem_cgroup.
This is done by page_cgroup_move_lock and other flags per stats.

I think FileMapped works well. But, considering update of other
statistics, current logic doesn't works well. Assume following case,

     set flag
     ..(delay by some preemption)..
                                            clear flag
                                            pc's flag is unset and
                                            don't update anything.
     memcg = pc->mem_cgroup
     set flag to pc->mem_cgroup
     update memcg stat

In this case, the stat will be leaked out. I think memcg's account
routine should see no flags. To avoid using memcg's original flags,
we need to prevent overwriting pc->mem_cgroup while we updating
the memcg.

This patch adds
   - mem_cgroup_begin_update_page_stats(),
   - mem_cgroup_end_update_page_stats()

And guarantees pc->mem_cgroup is not overwritten while updating.
The caller should do

     mem_cgroup_begin_update_page_stats()
     if (.... set flag or some)
            update vmstat
            update memcg's stat
     mem_cgroup_end_update_page_stats().

This beign...end will check a counter (which is 0 in most case) under
rcu_read_lock/rcu_read_unlock. And take a spinlock if required.

Following patch in this series will remove PCG_FILE_MAPPED flag.
---
 include/linux/memcontrol.h |   49 +++++++++++++++++++++++++++++++++++++++++-
 mm/memcontrol.c            |   50 +++++++++++++++++++++++++++++--------------
 mm/rmap.c                  |    5 ++++
 3 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 598b3c9..4a61c4b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -141,9 +141,52 @@ static inline bool mem_cgroup_disabled(void)
 	return false;
 }
 
-void mem_cgroup_update_page_stat(struct page *page,
+/*
+ * Unlike vmstat, page's mem_cgroup can be overwritten and for which memcg
+ * the page stats should be accounted to is determined dynamically.
+ * Unfortunately, there are many races. To avoid races, the caller should do
+ *
+ * locked = mem_cgroup_begin_update_page_stat(page)
+ * if (set page flags etc)
+ * 	mem_cgroup_update_page_stat(page);
+ * mem_cgroup_end_update_page_stat(page, locked);
+ *
+ * Between [begin, end) calls, page's mem_cgroup will never be changed.
+ */
+void __mem_cgroup_update_page_stat(struct page *page,
+			enum mem_cgroup_page_stat_item idx,
+			int val);
+
+static inline void mem_cgroup_update_page_stat(struct page *page,
 				 enum mem_cgroup_page_stat_item idx,
-				 int val);
+				 int val)
+{
+	if (mem_cgroup_disabled())
+		return;
+	__mem_cgroup_update_page_stat(page, idx, val);
+}
+
+bool __mem_cgroup_begin_update_page_stats(struct page *page,
+		unsigned long *flags);
+static inline bool
+mem_cgroup_begin_update_page_stats(struct page *page, unsigned long *flags)
+{
+	if (mem_cgroup_disabled())
+		return false;
+	return __mem_cgroup_begin_update_page_stats(page, flags);
+}
+
+void __mem_cgroup_end_update_page_stats(struct page *page, bool locked,
+		unsigned long *flags);
+
+static inline void 
+mem_cgroup_end_update_page_stats(struct page *page,
+		bool locked, unsigned long *flags)
+{
+	if (mem_cgroup_disabled())
+		return;
+	__mem_cgroup_end_update_page_stats(page, locked, flags);
+}
 
 static inline void mem_cgroup_inc_page_stat(struct page *page,
 					    enum mem_cgroup_page_stat_item idx)
@@ -171,6 +214,8 @@ void mem_cgroup_split_huge_fixup(struct page *head);
 bool mem_cgroup_bad_page_check(struct page *page);
 void mem_cgroup_print_bad_page(struct page *page);
 #endif
+
+
 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
 struct mem_cgroup;
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d183e1b..f4e6d5c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1831,27 +1831,50 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
  * possibility of race condition. If there is, we take a lock.
  */
 
-void mem_cgroup_update_page_stat(struct page *page,
-				 enum mem_cgroup_page_stat_item idx, int val)
+/*
+ * This function calls rcu_read_lock(). This lock is unlocked by 
+ * __mem_cgroup_end_update_page_stat().
+ */
+bool __mem_cgroup_begin_update_page_stats(struct page *page, unsigned long *flags)
 {
 	struct mem_cgroup *memcg;
 	struct page_cgroup *pc = lookup_page_cgroup(page);
 	bool need_unlock = false;
-	unsigned long uninitialized_var(flags);
 
 	rcu_read_lock();
 	memcg = pc->mem_cgroup;
-	if (unlikely(!memcg || !PageCgroupUsed(pc)))
+	if (!memcg  || !PageCgroupUsed(pc))
 		goto out;
-	/* pc->mem_cgroup is unstable ? */
 	if (unlikely(mem_cgroup_stealed(memcg)) || PageTransHuge(page)) {
-		/* take a lock against to access pc->mem_cgroup */
-		move_lock_page_cgroup(pc, &flags);
+		move_lock_page_cgroup(pc, flags);
 		need_unlock = true;
-		memcg = pc->mem_cgroup;
-		if (!memcg || !PageCgroupUsed(pc))
-			goto out;
 	}
+out:
+	return need_unlock;
+}
+EXPORT_SYMBOL(__mem_cgroup_begin_update_page_stats);
+
+void __mem_cgroup_end_update_page_stats(struct page *page, bool locked,
+			unsigned long *flags)
+{
+	struct page_cgroup *pc;
+
+	if (unlikely(locked)) {
+		pc = lookup_page_cgroup(page);
+		move_unlock_page_cgroup(pc, flags);
+	}
+	rcu_read_unlock();
+}
+EXPORT_SYMBOL(__mem_cgroup_end_update_page_stats);
+
+void __mem_cgroup_update_page_stat(struct page *page,
+				 enum mem_cgroup_page_stat_item idx, int val)
+{
+	struct page_cgroup *pc = lookup_page_cgroup(page);
+	struct mem_cgroup *memcg = pc->mem_cgroup;
+
+	if (!memcg || !PageCgroupUsed(pc))
+		return;
 
 	switch (idx) {
 	case MEMCG_NR_FILE_MAPPED:
@@ -1866,14 +1889,9 @@ void mem_cgroup_update_page_stat(struct page *page,
 	}
 
 	this_cpu_add(memcg->stat->count[idx], val);
-
-out:
-	if (unlikely(need_unlock))
-		move_unlock_page_cgroup(pc, &flags);
-	rcu_read_unlock();
 	return;
 }
-EXPORT_SYMBOL(mem_cgroup_update_page_stat);
+EXPORT_SYMBOL(__mem_cgroup_update_page_stat);
 
 /*
  * size of first charge trial. "32" comes from vmscan.c's magic value.
diff --git a/mm/rmap.c b/mm/rmap.c
index 54d140a..3648c88 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1105,10 +1105,15 @@ void page_add_new_anon_rmap(struct page *page,
  */
 void page_add_file_rmap(struct page *page)
 {
+	unsigned long flags;
+	bool locked;
+
+	locked = mem_cgroup_begin_update_page_stats(page, &flags);
 	if (atomic_inc_and_test(&page->_mapcount)) {
 		__inc_zone_page_state(page, NR_FILE_MAPPED);
 		mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED);
 	}
+	mem_cgroup_end_update_page_stats(page, locked, &flags);
 }
 
 /**
-- 
1.7.4.1


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]