Re: [PATCH] cgroup: use an ordered workqueue for cgroup destruction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 7 Feb 2014, Johannes Weiner wrote:
> On Thu, Feb 06, 2014 at 03:56:01PM -0800, Hugh Dickins wrote:
> > Sometimes the cleanup after memcg hierarchy testing gets stuck in
> > mem_cgroup_reparent_charges(), unable to bring non-kmem usage down to 0.
> > 
> > There may turn out to be several causes, but a major cause is this: the
> > workitem to offline parent can get run before workitem to offline child;
> > parent's mem_cgroup_reparent_charges() circles around waiting for the
> > child's pages to be reparented to its lrus, but it's holding cgroup_mutex
> > which prevents the child from reaching its mem_cgroup_reparent_charges().
> > 
> > Just use an ordered workqueue for cgroup_destroy_wq.
> > 
> > Fixes: e5fca243abae ("cgroup: use a dedicated workqueue for cgroup destruction")
> > Suggested-by: Filipe Brandenburger <filbranden@xxxxxxxxxx>
> > Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx # 3.10+
> 
> I think this is a good idea for now and -stable:
> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>

You might be wondering why this patch didn't reach Linus yet.

It's because more thorough testing, by others here, found that it
wasn't always solving the problem: so I asked Tejun privately to
hold off from sending it in, until we'd worked out why not.

Most of our testing being on a v3,11-based kernel, it was perfectly
possible that the problem was merely our own e.g. missing Tejun's
8a2b75384444 ("workqueue: fix ordered workqueues in NUMA setups").

But that turned out not to be enough to fix it either. Then Filipe
pointed out how percpu_ref_kill_and_confirm() uses call_rcu_sched()
before we ever get to put the offline on to the workqueue: by the
time we get to the workqueue, the ordering has already been lost.

So, thanks for the Acks, but I'm afraid that this ordered workqueue
solution is just not good enough: we should simply forget that patch
and provide a different answer.

So I'm now posting a couple of alternative solutions: 1/2 from Filipe
at the memcg end, and 2/2 from me at the cgroup end.  Each of these
has stood up to better testing, so you can choose between them,
or work out a better answer.

(By the way, I have another little pair of memcg/cgroup fixes to post
shortly, nothing to do with these two: it would be less confusing if
I had some third fix to add in there, but sadly not.)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]