Re: [BUGFIX][PATCH v2] add mem_cgroup_replace_page_cache.

KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> · Mon, 12 Dec 2011 09:48:05 +0900

On Fri, 9 Dec 2011 12:37:01 -0800
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Thu, 8 Dec 2011 16:18:29 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> 
> > commit ef6a3c6311 adds a function replace_page_cache_page(). This
> > function replaces a page in radix-tree with a new page.
> > At doing this, memory cgroup need to fix up the accounting information.
> > memcg need to check PCG_USED bit etc.
> > 
> > In some(many?) case, 'newpage' is on LRU before calling replace_page_cache().
> > So, memcg's LRU accounting information should be fixed, too.
> > 
> > This patch adds mem_cgroup_replace_page_cache() and removing old hooks.
> > In that function, old pages will be unaccounted without touching res_counter
> > and new page will be accounted to the memcg (of old page). At overwriting
> > pc->mem_cgroup of newpage, take zone->lru_lock and avoid race with
> > LRU handling.
> > 
> > Background:
> >   replace_page_cache_page() is called by FUSE code in its splice() handling.
> >   Here, 'newpage' is replacing oldpage but this newpage is not a newly allocated
> >   page and may be on LRU. LRU mis-accounting will be critical for memory cgroup
> >   because rmdir() checks the whole LRU is empty and there is no account leak.
> >   If a page is on the other LRU than it should be, rmdir() will fail.
> > 
> > Changelog: v1 -> v2
> >   - fixed mem_cgroup_disabled() check missing.
> >   - added comments.
> > 
> > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> > ---
> >  include/linux/memcontrol.h |    6 ++++++
> >  mm/filemap.c               |   18 ++----------------
> >  mm/memcontrol.c            |   44 ++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 52 insertions(+), 16 deletions(-)
> 
> It's a relatively intrusive patch and I'm a bit concerned about
> feeding it into 3.2.
> 
> How serious is the bug, and which kernel version(s) do you think we
> should fix it in?

This bug was added by commit ef6a3c63112e (2011 Mar), but no bug report yet.
I guess there are not many people who use memcg and FUSE at the same time
with upstream kernels.

The result of this bug is that admin cannot destroy a memcg because of
account leak. So, no panic, no deadlock. And, even if an active cgroup exist,
umount can succseed. So no problem at shutdown.

I want this fix should be merged when/after unify-lru works goes to upstream.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>