Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

Johannes Weiner <hannes@xxxxxxxxxxx> · Thu, 22 Jan 2015 10:19:43 -0500

Hi,

On Thu, Jan 22, 2015 at 09:34:54AM -0500, Tejun Heo wrote:
> On Thu, Jan 22, 2015 at 08:45:50AM -0500, Johannes Weiner wrote:
> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> > index bb263d0caab3..9a09308c8066 100644
> > --- a/kernel/cgroup.c
> > +++ b/kernel/cgroup.c
> > @@ -1819,8 +1819,11 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
> >  			goto out_unlock;
> >  		}
> >  
> > -		if (root->flags ^ opts.flags)
> > -			pr_warn("new mount options do not match the existing superblock, will be ignored\n");
> > +		if (root->flags ^ opts.flags) {
> > +			pr_warn("new mount options do not match the existing superblock\n");
> > +			ret = -EBUSY;
> > +			goto out_unlock;
> > +		}
> 
> Do we really need the above chunk?

Inform and ignore or fail hard?  I guess we can drop this hunk and
keep with the current behavior.

> > @@ -1909,7 +1912,7 @@ static void cgroup_kill_sb(struct super_block *sb)
> >  	 *
> >  	 * And don't kill the default root.
> >  	 */
> > -	if (css_has_online_children(&root->cgrp.self) ||
> > +	if (!list_empty(&root->cgrp.self.children) ||
> >  	    root == &cgrp_dfl_root)
> >  		cgroup_put(&root->cgrp);
> 
> I tried to do something a bit more advanced so that eventual async
> release of dying children, if they happen, can also release the
> hierarchy but I don't think it really matters unless we can forcefully
> drain.  So, shouldn't just the above part be enough?

Yep, I'd be fine with that.

---

>From 3d7ae5aeb16ce6118d8bff17194e791339a1f06c Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@xxxxxxxxxxx>
Date: Thu, 22 Jan 2015 08:16:31 -0500
Subject: [patch] kernel: cgroup: prevent mount hang due to memory controller
 lifetime

Since b2052564e66d ("mm: memcontrol: continue cache reclaim from
offlined groups"), re-mounting the memory controller after using it is
very likely to hang.

The cgroup core assumes that any remaining references after deleting a
cgroup are temporary in nature, and synchroneously waits for them, but
the above-mentioned commit has left-over page cache pin its css until
it is reclaimed naturally.  That being said, swap entries and charged
kernel memory have been doing the same indefinite pinning forever, the
bug is just more likely to trigger with left-over page cache.

Reparenting kernel memory is highly impractical, which leaves changing
the cgroup assumptions to reflect this: once a controller has been
mounted and used, it has internal state that is independent from mount
and cgroup lifetime.  It can be unmounted and remounted, but it can't
be reconfigured during subsequent mounts.

Don't offline the controller root as long as there are any children,
dead or alive.  A remount will no longer wait for these old references
to drain, it will simply mount the persistent controller state again.

Reported-by: "Suzuki K. Poulose" <Suzuki.Poulose@xxxxxxx>
Reported-by: Will Deacon <will.deacon@xxxxxxx>
Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
---
 kernel/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index bb263d0caab3..04cfe8ace520 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1909,7 +1909,7 @@ static void cgroup_kill_sb(struct super_block *sb)
 	 *
 	 * And don't kill the default root.
 	 */
-	if (css_has_online_children(&root->cgrp.self) ||
+	if (!list_empty(&root->cgrp.self.children) ||
 	    root == &cgrp_dfl_root)
 		cgroup_put(&root->cgrp);
 	else
-- 
2.2.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>