On Fri, 7 Feb 2014, Johannes Weiner wrote: > On Thu, Feb 06, 2014 at 03:56:01PM -0800, Hugh Dickins wrote: > > Sometimes the cleanup after memcg hierarchy testing gets stuck in > > mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0. > > > > There may turn out to be several causes, but a major cause is this: the > > workitem to offline parent can get run before workitem to offline child; > > parent's mem_cgroup_reparent_charges() circles around waiting for the > > child's pages to be reparented to its lrus, but it's holding cgroup_mutex > > which prevents the child from reaching its mem_cgroup_reparent_charges(). > > > > Just use an ordered workqueue for cgroup_destroy_wq. > > > > Fixes: e5fca243abae ("cgroup: use a dedicated workqueue for cgroup destruction") > > Suggested-by: Filipe Brandenburger <filbranden@xxxxxxxxxx> > > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx # 3.10+ > > I think this is a good idea for now and -stable: > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> You might be wondering why this patch didn't reach Linus yet. It's because more thorough testing, by others here, found that it wasn't always solving the problem: so I asked Tejun privately to hold off from sending it in, until we'd worked out why not. Most of our testing being on a v3,11-based kernel, it was perfectly possible that the problem was merely our own e.g. missing Tejun's 8a2b75384444 ("workqueue: fix ordered workqueues in NUMA setups"). But that turned out not to be enough to fix it either. Then Filipe pointed out how percpu_ref_kill_and_confirm() uses call_rcu_sched() before we ever get to put the offline on to the workqueue: by the time we get to the workqueue, the ordering has already been lost. So, thanks for the Acks, but I'm afraid that this ordered workqueue solution is just not good enough: we should simply forget that patch and provide a different answer. So I'm now posting a couple of alternative solutions: 1/2 from Filipe at the memcg end, and 2/2 from me at the cgroup end. Each of these has stood up to better testing, so you can choose between them, or work out a better answer. (By the way, I have another little pair of memcg/cgroup fixes to post shortly, nothing to do with these two: it would be less confusing if I had some third fix to add in there, but sadly not.) Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>