Re: memcg: add mlock statistic in memory.stat

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 11, 2012 at 3:59 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> On Wed, 11 Jan 2012 15:17:42 -0800 (PST)
> Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>
>> On Wed, 11 Jan 2012, Ying Han wrote:
>>
>> > We have the nr_mlock stat both in meminfo as well as vmstat system wide, this
>> > patch adds the mlock field into per-memcg memory stat. The stat itself enhances
>> > the metrics exported by memcg, especially is used together with "uneivctable"
>> > lru stat.
>> >
>> > --- a/include/linux/page_cgroup.h
>> > +++ b/include/linux/page_cgroup.h
>> > @@ -10,6 +10,7 @@ enum {
>> >     /* flags for mem_cgroup and file and I/O status */
>> >     PCG_MOVE_LOCK, /* For race between move_account v.s. following bits */
>> >     PCG_FILE_MAPPED, /* page is accounted as "mapped" */
>> > +   PCG_MLOCK, /* page is accounted as "mlock" */
>> >     /* No lock in page_cgroup */
>> >     PCG_ACCT_LRU, /* page has been accounted for (under lru_lock) */
>> >     __NR_PCG_FLAGS,
>>
>> Is this really necessary?  KAMEZAWA-san is engaged in trying to reduce
>> the number of PageCgroup flags, and I expect that in due course we shall
>> want to merge them in with Page flags, so adding more is unwelcome.
>> I'd  have thought that with memcg_ hooks in the right places,
>> a separate flag would not be necessary?
>>
>
> Please don't ;)
>
> NR_UNEIVCTABLE_LRU is not enough ?

Seems not.

The unevictable lru includes more than mlock()'d pages ( SHM_LOCK'd
etc). There are use cases where we like to know the mlock-ed size
per-cgroup. We used to archived that in fake-numa based container by
reading the value from per-node meminfo, however we miss that
information in memcg. What do you think?

Thank you Hugh and Kame for the reference. Apparently I missed that
patch and I will take a look at it. (still catching up emails after
vacation).

--Ying

>
> Following is the patch I posted before to remove PCG_FILE_MAPPED.
> Then, I think you can use similar logic and make use of UNEVICTABLE flags.
>
> ==
> better (lockless) idea is welcomed.
>
> From fd2b5822838eebbacc41f343f9eb8c6f0ad8e1cc Mon Sep 17 00:00:00 2001
> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> Date: Thu, 15 Dec 2011 11:42:49 +0900
> Subject: [PATCH 2/5] memcg: safer page stat updating
>
> Now, page stat accounting is done like this.
>
>     if (....set flag or some)
>            update vmstat
>            update memcg'stat
>
> Unlike vmstat, memcg must take care of changes in pc->mem_cgroup.
> This is done by page_cgroup_move_lock and other flags per stats.
>
> I think FileMapped works well. But, considering update of other
> statistics, current logic doesn't works well. Assume following case,
>
>     set flag
>     ..(delay by some preemption)..
>                                            clear flag
>                                            pc's flag is unset and
>                                            don't update anything.
>     memcg = pc->mem_cgroup
>     set flag to pc->mem_cgroup
>     update memcg stat
>
> In this case, the stat will be leaked out. I think memcg's account
> routine should see no flags. To avoid using memcg's original flags,
> we need to prevent overwriting pc->mem_cgroup while we updating
> the memcg.
>
> This patch adds
>   - mem_cgroup_begin_update_page_stats(),
>   - mem_cgroup_end_update_page_stats()
>
> And guarantees pc->mem_cgroup is not overwritten while updating.
> The caller should do
>
>     mem_cgroup_begin_update_page_stats()
>     if (.... set flag or some)
>            update vmstat
>            update memcg's stat
>     mem_cgroup_end_update_page_stats().
>
> This beign...end will check a counter (which is 0 in most case) under
> rcu_read_lock/rcu_read_unlock. And take a spinlock if required.
>
> Following patch in this series will remove PCG_FILE_MAPPED flag.
> ---
>  include/linux/memcontrol.h |   49 +++++++++++++++++++++++++++++++++++++++++-
>  mm/memcontrol.c            |   50 +++++++++++++++++++++++++++++--------------
>  mm/rmap.c                  |    5 ++++
>  3 files changed, 86 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 598b3c9..4a61c4b 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -141,9 +141,52 @@ static inline bool mem_cgroup_disabled(void)
>        return false;
>  }
>
> -void mem_cgroup_update_page_stat(struct page *page,
> +/*
> + * Unlike vmstat, page's mem_cgroup can be overwritten and for which memcg
> + * the page stats should be accounted to is determined dynamically.
> + * Unfortunately, there are many races. To avoid races, the caller should do
> + *
> + * locked = mem_cgroup_begin_update_page_stat(page)
> + * if (set page flags etc)
> + *     mem_cgroup_update_page_stat(page);
> + * mem_cgroup_end_update_page_stat(page, locked);
> + *
> + * Between [begin, end) calls, page's mem_cgroup will never be changed.
> + */
> +void __mem_cgroup_update_page_stat(struct page *page,
> +                       enum mem_cgroup_page_stat_item idx,
> +                       int val);
> +
> +static inline void mem_cgroup_update_page_stat(struct page *page,
>                                 enum mem_cgroup_page_stat_item idx,
> -                                int val);
> +                                int val)
> +{
> +       if (mem_cgroup_disabled())
> +               return;
> +       __mem_cgroup_update_page_stat(page, idx, val);
> +}
> +
> +bool __mem_cgroup_begin_update_page_stats(struct page *page,
> +               unsigned long *flags);
> +static inline bool
> +mem_cgroup_begin_update_page_stats(struct page *page, unsigned long *flags)
> +{
> +       if (mem_cgroup_disabled())
> +               return false;
> +       return __mem_cgroup_begin_update_page_stats(page, flags);
> +}
> +
> +void __mem_cgroup_end_update_page_stats(struct page *page, bool locked,
> +               unsigned long *flags);
> +
> +static inline void
> +mem_cgroup_end_update_page_stats(struct page *page,
> +               bool locked, unsigned long *flags)
> +{
> +       if (mem_cgroup_disabled())
> +               return;
> +       __mem_cgroup_end_update_page_stats(page, locked, flags);
> +}
>
>  static inline void mem_cgroup_inc_page_stat(struct page *page,
>                                            enum mem_cgroup_page_stat_item idx)
> @@ -171,6 +214,8 @@ void mem_cgroup_split_huge_fixup(struct page *head);
>  bool mem_cgroup_bad_page_check(struct page *page);
>  void mem_cgroup_print_bad_page(struct page *page);
>  #endif
> +
> +
>  #else /* CONFIG_CGROUP_MEM_RES_CTLR */
>  struct mem_cgroup;
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d183e1b..f4e6d5c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1831,27 +1831,50 @@ bool mem_cgroup_handle_oom(struct mem_cgroup *memcg, gfp_t mask)
>  * possibility of race condition. If there is, we take a lock.
>  */
>
> -void mem_cgroup_update_page_stat(struct page *page,
> -                                enum mem_cgroup_page_stat_item idx, int val)
> +/*
> + * This function calls rcu_read_lock(). This lock is unlocked by
> + * __mem_cgroup_end_update_page_stat().
> + */
> +bool __mem_cgroup_begin_update_page_stats(struct page *page, unsigned long *flags)
>  {
>        struct mem_cgroup *memcg;
>        struct page_cgroup *pc = lookup_page_cgroup(page);
>        bool need_unlock = false;
> -       unsigned long uninitialized_var(flags);
>
>        rcu_read_lock();
>        memcg = pc->mem_cgroup;
> -       if (unlikely(!memcg || !PageCgroupUsed(pc)))
> +       if (!memcg  || !PageCgroupUsed(pc))
>                goto out;
> -       /* pc->mem_cgroup is unstable ? */
>        if (unlikely(mem_cgroup_stealed(memcg)) || PageTransHuge(page)) {
> -               /* take a lock against to access pc->mem_cgroup */
> -               move_lock_page_cgroup(pc, &flags);
> +               move_lock_page_cgroup(pc, flags);
>                need_unlock = true;
> -               memcg = pc->mem_cgroup;
> -               if (!memcg || !PageCgroupUsed(pc))
> -                       goto out;
>        }
> +out:
> +       return need_unlock;
> +}
> +EXPORT_SYMBOL(__mem_cgroup_begin_update_page_stats);
> +
> +void __mem_cgroup_end_update_page_stats(struct page *page, bool locked,
> +                       unsigned long *flags)
> +{
> +       struct page_cgroup *pc;
> +
> +       if (unlikely(locked)) {
> +               pc = lookup_page_cgroup(page);
> +               move_unlock_page_cgroup(pc, flags);
> +       }
> +       rcu_read_unlock();
> +}
> +EXPORT_SYMBOL(__mem_cgroup_end_update_page_stats);
> +
> +void __mem_cgroup_update_page_stat(struct page *page,
> +                                enum mem_cgroup_page_stat_item idx, int val)
> +{
> +       struct page_cgroup *pc = lookup_page_cgroup(page);
> +       struct mem_cgroup *memcg = pc->mem_cgroup;
> +
> +       if (!memcg || !PageCgroupUsed(pc))
> +               return;
>
>        switch (idx) {
>        case MEMCG_NR_FILE_MAPPED:
> @@ -1866,14 +1889,9 @@ void mem_cgroup_update_page_stat(struct page *page,
>        }
>
>        this_cpu_add(memcg->stat->count[idx], val);
> -
> -out:
> -       if (unlikely(need_unlock))
> -               move_unlock_page_cgroup(pc, &flags);
> -       rcu_read_unlock();
>        return;
>  }
> -EXPORT_SYMBOL(mem_cgroup_update_page_stat);
> +EXPORT_SYMBOL(__mem_cgroup_update_page_stat);
>
>  /*
>  * size of first charge trial. "32" comes from vmscan.c's magic value.
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 54d140a..3648c88 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1105,10 +1105,15 @@ void page_add_new_anon_rmap(struct page *page,
>  */
>  void page_add_file_rmap(struct page *page)
>  {
> +       unsigned long flags;
> +       bool locked;
> +
> +       locked = mem_cgroup_begin_update_page_stats(page, &flags);
>        if (atomic_inc_and_test(&page->_mapcount)) {
>                __inc_zone_page_state(page, NR_FILE_MAPPED);
>                mem_cgroup_inc_page_stat(page, MEMCG_NR_FILE_MAPPED);
>        }
> +       mem_cgroup_end_update_page_stats(page, locked, &flags);
>  }
>
>  /**
> --
> 1.7.4.1
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]