Michal Hocko <mhocko@xxxxxxx> writes: > On Wed 18-07-12 14:26:36, Andrew Morton wrote: >> >> The patch titled >> Subject: hugetlb/cgroup: simplify pre_destroy callback >> has been added to the -mm tree. Its filename is >> hugetlb-cgroup-simplify-pre_destroy-callback.patch >> >> Before you just go and hit "reply", please: >> a) Consider who else should be cc'ed >> b) Prefer to cc a suitable mailing list as well >> c) Ideally: find the original patch on the mailing list and do a >> reply-to-all to that, adding suitable additional cc's >> >> *** Remember to use Documentation/SubmitChecklist when testing your code *** >> >> The -mm tree is included into linux-next and is updated >> there every 3-4 working days >> >> ------------------------------------------------------ >> From: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> >> Subject: hugetlb/cgroup: simplify pre_destroy callback >> >> Since we cannot fail in hugetlb_cgroup_move_parent(), we don't really need >> to check whether cgroup have any change left after that. Also skip those >> hstates for which we don't have any charge in this cgroup. > > IIUC this depends on a non-existent (cgroup) patch. I guess something > like the patch at the end should address it. I haven't tested it though > so it is not signed-off-by yet. > >> Based on an earlier patch from Wanpeng Li. >> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx> >> Signed-off-by: Wanpeng Li <liwanp@xxxxxxxxxxxxxxxxxx> >> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> >> Cc: Michal Hocko <mhocko@xxxxxxx> >> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> >> --- >> >> mm/hugetlb_cgroup.c | 49 ++++++++++++++++++------------------------ >> 1 file changed, 21 insertions(+), 28 deletions(-) >> >> diff -puN mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback mm/hugetlb_cgroup.c >> --- a/mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback >> +++ a/mm/hugetlb_cgroup.c >> @@ -65,18 +65,6 @@ static inline struct hugetlb_cgroup *par >> return hugetlb_cgroup_from_cgroup(cg->parent); >> } >> >> -static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg) >> -{ >> - int idx; >> - struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg); >> - >> - for (idx = 0; idx < hugetlb_max_hstate; idx++) { >> - if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0) >> - return true; >> - } >> - return false; >> -} >> - >> static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup) >> { >> int idx; >> @@ -159,24 +147,29 @@ static int hugetlb_cgroup_pre_destroy(st >> { >> struct hstate *h; >> struct page *page; >> - int ret = 0, idx = 0; >> + int ret = 0, idx; >> + struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup); >> >> - do { >> - if (cgroup_task_count(cgroup) || >> - !list_empty(&cgroup->children)) { >> - ret = -EBUSY; >> - goto out; >> - } >> - for_each_hstate(h) { >> - spin_lock(&hugetlb_lock); >> - list_for_each_entry(page, &h->hugepage_activelist, lru) >> - hugetlb_cgroup_move_parent(idx, cgroup, page); >> >> - spin_unlock(&hugetlb_lock); >> - idx++; >> - } >> - cond_resched(); >> - } while (hugetlb_cgroup_have_usage(cgroup)); >> + if (cgroup_task_count(cgroup) || >> + !list_empty(&cgroup->children)) { >> + ret = -EBUSY; >> + goto out; >> + } >> + >> + for_each_hstate(h) { >> + /* >> + * if we don't have any charge, skip this hstate >> + */ >> + idx = hstate_index(h); >> + if (res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE) == 0) >> + continue; >> + spin_lock(&hugetlb_lock); >> + list_for_each_entry(page, &h->hugepage_activelist, lru) >> + hugetlb_cgroup_move_parent(idx, cgroup, page); >> + spin_unlock(&hugetlb_lock); >> + VM_BUG_ON(res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)); >> + } >> out: >> return ret; >> } >> _ > > --- > From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001 > From: Michal Hocko <mhocko@xxxxxxx> > Date: Thu, 19 Jul 2012 13:23:23 +0200 > Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy > > 3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the > cgroup_mutex lock while calling pre_destroy callbacks because memory > controller could deadlock because force_empty triggered reclaim. > Since "memcg: move charges to root cgroup if use_hierarchy=0" there is > no reclaim going on from mem_cgroup_force_empty though so we can safely > keep the cgroup_mutex locked. This has an advantage that no tasks might > be added during pre_destroy callback and so the handlers don't have to > consider races when new tasks add new charges. This simplifies the > implementation. > --- > kernel/cgroup.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 0f3527d..9dba05d 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -4181,7 +4181,6 @@ again: > mutex_unlock(&cgroup_mutex); > return -EBUSY; > } > - mutex_unlock(&cgroup_mutex); > > /* > * In general, subsystem has no css->refcnt after pre_destroy(). But > @@ -4204,7 +4203,6 @@ again: > return ret; > } > > - mutex_lock(&cgroup_mutex); > parent = cgrp->parent; > if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) { > clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags); mem_cgroup_force_empty still calls lru_add_drain_all ->schedule_on_each_cpu -> get_online_cpus ->mutex_lock(&cpu_hotplug.lock); So wont we deadlock ? -aneesh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>