On Wed, Jan 11, 2012 at 3:59 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > On Wed, 11 Jan 2012 15:17:42 -0800 (PST) > Hugh Dickins <hughd@xxxxxxxxxx> wrote: > >> On Wed, 11 Jan 2012, Ying Han wrote: >> >> > We have the nr_mlock stat both in meminfo as well as vmstat system wide, this >> > patch adds the mlock field into per-memcg memory stat. The stat itself enhances >> > the metrics exported by memcg, especially is used together with "uneivctable" >> > lru stat. >> > >> > --- a/include/linux/page_cgroup.h >> > +++ b/include/linux/page_cgroup.h >> > @@ -10,6 +10,7 @@ enum { >> > /* flags for mem_cgroup and file and I/O status */ >> > PCG_MOVE_LOCK, /* For race between move_account v.s. following bits */ >> > PCG_FILE_MAPPED, /* page is accounted as "mapped" */ >> > + PCG_MLOCK, /* page is accounted as "mlock" */ >> > /* No lock in page_cgroup */ >> > PCG_ACCT_LRU, /* page has been accounted for (under lru_lock) */ >> > __NR_PCG_FLAGS, >> >> Is this really necessary? KAMEZAWA-san is engaged in trying to reduce >> the number of PageCgroup flags, and I expect that in due course we shall >> want to merge them in with Page flags, so adding more is unwelcome. >> I'd have thought that with memcg_ hooks in the right places, >> a separate flag would not be necessary? >> > > Please don't ;) > > NR_UNEIVCTABLE_LRU is not enough ? Seems not. The unevictable lru includes more than mlock()'d pages ( SHM_LOCK'd etc). There are use cases where we like to know the mlock-ed size per-cgroup. We used to archived that in fake-numa based container by reading the value from per-node meminfo, however we miss that information in memcg. What do you think? Thank you Hugh and Kame for the reference. Apparently I missed that patch and I will take a look at it. (still catching up emails after vacation). --Ying > > Following is the patch I posted before to remove PCG_FILE_MAPPED. > Then, I think you can use similar logic and make use of UNEVICTABLE flags. > > == > better (lockless) idea is welcomed. > > From fd2b5822838eebbacc41f343f9eb8c6f0ad8e1cc Mon Sep 17 00:00:00 2001 > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > Date: Thu, 15 Dec 2011 11:42:49 +0900 > Subject: [PATCH 2/5] memcg: safer page stat updating > > Now, page stat accounting is done like this. > > if (....set flag or some) > update vmstat > update memcg'stat > > Unlike vmstat, memcg must take care of changes in pc->mem_cgroup. > This is done by page_cgroup_move_lock and other flags per stats. > > I think FileMapped works well. But, considering update of other > statistics, current logic doesn't works well. Assume following case, > > set flag > ..(delay by some preemption).. > clear flag > pc's flag is unset and > don't update anything. > memcg = pc->mem_cgroup > set flag to pc->mem_cgroup > update memcg stat > > In this case, the stat will be leaked out. I think memcg's account > routine should see no flags. To avoid using memcg's original flags, > we need to prevent overwriting pc->mem_cgroup while we updating > the memcg. > > This patch adds > - mem_cgroup_begin_update_page_stats(), > - mem_cgroup_end_update_page_stats() > > And guarantees pc->mem_cgroup is not overwritten while updating. > The caller should do > > mem_cgroup_begin_update_page_stats() > if (.... set flag or some) > update vmstat > update memcg's stat > mem_cgroup_end_update_page_stats(). > > This beign...end will check a counter (which is 0 in most case) under > rcu_read_lock/rcu_read_unlock. And take a spinlock if required. > > Following patch in this series will remove PCG_FILE_MAPPED flag. > --- > include/linux/memcontrol.h | 49 +++++++++++++++++++++++++++++++++++++++++- > mm/memcontrol.c | 50 +++++++++++++++++++++++++++++-------------- > mm/rmap.c | 5 ++++ > 3 files changed, 86 insertions(+), 18 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 598b3c9..4a61c4b 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -141,9 +141,52 @@ static inline bool mem_cgroup_disabled(void) > return false; > } > > -void mem_cgroup_update_page_stat(struct page *page, > +/* > + * Unlike vmstat, page's mem_cgroup can be overwritten and for which memcg > + * the page stats should be accounted to is determined dynamically. > + * Unfortunately, there are many races. To avoid races, the caller should do > + * > + * locked = mem_cgroup_begin_update_page_stat(page) > + * if (set page flags etc) > + * mem_cgroup_update_page_stat(page); > + * mem_cgroup_end_update_page_stat(page, locked); > + * > + * Between [begin, end) calls, page's mem_cgroup will never be changed. > + */ > +void __mem_cgroup_update_page_stat(struct page *page, > + enum mem_cgroup_page_stat_item idx, > + int val); > + > +static inline void mem_cgroup_update_page_stat(struct page *page, > enum mem_cgroup_page_stat_item idx, > - int val); > + int val) > +{ > + if (mem_cgroup_disabled()) > + return; > + __mem_cgroup_update_page_stat(page, idx, val); > +} > + > +bool __mem_cgroup_begin_update_page_stats(struct page *page, > + unsigned long *flags); > +static inline bool > +mem_cgroup_begin_update_page_stats(struct page *page, unsigned long *flags) > +{ > + if (mem_cgroup_disabled()) > + return false; > + return __mem_cgroup_begin_update_page_stats(page, flags); > +} > + > +void __mem_cgroup_end_update_page_stats(struct page *page, bool locked, > + unsigned long *flags); > + > +static inline void > +mem_cgroup_end_update_page_stats(struct page *page, > + bool locked, unsigned long *flags) > +{ > + if (mem_cgroup_disabled()) > + return; > + __mem_cgroup_end_update_page_stats(page, locked, flags); > +} > > static inline void mem_cgroup_inc_page_stat(struct page *page, > enum mem_cgroup_page_stat_item idx) > @@ -171,6 +214,8 @@ void mem_cgroup_split_huge_fixup(struct page *head); > bool mem_cgroup_bad_page_check(struct page *page); > void mem_cgroup_print_bad_page(struct page *page); > #endif > + > + > #else /* CONFIG_CGROUP_MEM_RES_CTLR */ > struct mem_cgroup; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index d183e1b..f4e6d5c 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1831,27 +1831,50 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask) > * possibility of race condition. If there is, we take a lock. > */ > > -void mem_cgroup_update_page_stat(struct page *page, > - enum mem_cgroup_page_stat_item idx, int val) > +/* > + * This function calls rcu_read_lock(). This lock is unlocked by > + * __mem_cgroup_end_update_page_stat(). > + */ > +bool __mem_cgroup_begin_update_page_stats(struct page *page, unsigned long *flags) > { > struct mem_cgroup *memcg; > struct page_cgroup *pc = lookup_page_cgroup(page); > bool need_unlock = false; > - unsigned long uninitialized_var(flags); > > rcu_read_lock(); > memcg = pc->mem_cgroup; > - if (unlikely(!memcg || !PageCgroupUsed(pc))) > + if (!memcg || !PageCgroupUsed(pc)) > goto out; > - /* pc->mem_cgroup is unstable ? */ > if (unlikely(mem_cgroup_stealed(memcg)) || PageTransHuge(page)) { > - /* take a lock against to access pc->mem_cgroup */ > - move_lock_page_cgroup(pc, &flags); > + move_lock_page_cgroup(pc, flags); > need_unlock = true; > - memcg = pc->mem_cgroup; > - if (!memcg || !PageCgroupUsed(pc)) > - goto out; > } > +out: > + return need_unlock; > +} > +EXPORT_SYMBOL(__mem_cgroup_begin_update_page_stats); > + > +void __mem_cgroup_end_update_page_stats(struct page *page, bool locked, > + unsigned long *flags) > +{ > + struct page_cgroup *pc; > + > + if (unlikely(locked)) { > + pc = lookup_page_cgroup(page); > + move_unlock_page_cgroup(pc, flags); > + } > + rcu_read_unlock(); > +} > +EXPORT_SYMBOL(__mem_cgroup_end_update_page_stats); > + > +void __mem_cgroup_update_page_stat(struct page *page, > + enum mem_cgroup_page_stat_item idx, int val) > +{ > + struct page_cgroup *pc = lookup_page_cgroup(page); > + struct mem_cgroup *memcg = pc->mem_cgroup; > + > + if (!memcg || !PageCgroupUsed(pc)) > + return; > > switch (idx) { > case MEMCG_NR_FILE_MAPPED: > @@ -1866,14 +1889,9 @@ void mem_cgroup_update_page_stat(struct page *page, > } > > this_cpu_add(memcg->stat->count[idx], val); > - > -out: > - if (unlikely(need_unlock)) > - move_unlock_page_cgroup(pc, &flags); > - rcu_read_unlock(); > return; > } > -EXPORT_SYMBOL(mem_cgroup_update_page_stat); > +EXPORT_SYMBOL(__mem_cgroup_update_page_stat); > > /* > * size of first charge trial. "32" comes from vmscan.c's magic value. > diff --git a/mm/rmap.c b/mm/rmap.c > index 54d140a..3648c88 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1105,10 +1105,15 @@ void page_add_new_anon_rmap(struct page *page, > */ > void page_add_file_rmap(struct page *page) > { > + unsigned long flags; > + bool locked; > + > + locked = mem_cgroup_begin_update_page_stats(page, &flags); > if (atomic_inc_and_test(&page->_mapcount)) { > __inc_zone_page_state(page, NR_FILE_MAPPED); > mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED); > } > + mem_cgroup_end_update_page_stats(page, locked, &flags); > } > > /** > -- > 1.7.4.1 > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href