Re: + hugetlb-cgroup-simplify-pre_destroy-callback.patch added to -mm tree

Michal Hocko <mhocko@xxxxxxx> · Thu, 19 Jul 2012 13:39:15 +0200

On Wed 18-07-12 14:26:36, Andrew Morton wrote:
> 
> The patch titled
>      Subject: hugetlb/cgroup: simplify pre_destroy callback
> has been added to the -mm tree.  Its filename is
>      hugetlb-cgroup-simplify-pre_destroy-callback.patch
> 
> Before you just go and hit "reply", please:
>    a) Consider who else should be cc'ed
>    b) Prefer to cc a suitable mailing list as well
>    c) Ideally: find the original patch on the mailing list and do a
>       reply-to-all to that, adding suitable additional cc's
> 
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
> 
> The -mm tree is included into linux-next and is updated
> there every 3-4 working days
> 
> ------------------------------------------------------
> From: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
> Subject: hugetlb/cgroup: simplify pre_destroy callback
> 
> Since we cannot fail in hugetlb_cgroup_move_parent(), we don't really need
> to check whether cgroup have any change left after that.  Also skip those
> hstates for which we don't have any charge in this cgroup.

IIUC this depends on a non-existent (cgroup) patch. I guess something
like the patch at the end should address it. I haven't tested it though
so it is not signed-off-by yet.

> Based on an earlier patch from Wanpeng Li.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
> Signed-off-by: Wanpeng Li <liwanp@xxxxxxxxxxxxxxxxxx>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> ---
> 
>  mm/hugetlb_cgroup.c |   49 ++++++++++++++++++------------------------
>  1 file changed, 21 insertions(+), 28 deletions(-)
> 
> diff -puN mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback mm/hugetlb_cgroup.c
> --- a/mm/hugetlb_cgroup.c~hugetlb-cgroup-simplify-pre_destroy-callback
> +++ a/mm/hugetlb_cgroup.c
> @@ -65,18 +65,6 @@ static inline struct hugetlb_cgroup *par
>  	return hugetlb_cgroup_from_cgroup(cg->parent);
>  }
>  
> -static inline bool hugetlb_cgroup_have_usage(struct cgroup *cg)
> -{
> -	int idx;
> -	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cg);
> -
> -	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> -		if ((res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE)) > 0)
> -			return true;
> -	}
> -	return false;
> -}
> -
>  static struct cgroup_subsys_state *hugetlb_cgroup_create(struct cgroup *cgroup)
>  {
>  	int idx;
> @@ -159,24 +147,29 @@ static int hugetlb_cgroup_pre_destroy(st
>  {
>  	struct hstate *h;
>  	struct page *page;
> -	int ret = 0, idx = 0;
> +	int ret = 0, idx;
> +	struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_cgroup(cgroup);
>  
> -	do {
> -		if (cgroup_task_count(cgroup) ||
> -		    !list_empty(&cgroup->children)) {
> -			ret = -EBUSY;
> -			goto out;
> -		}
> -		for_each_hstate(h) {
> -			spin_lock(&hugetlb_lock);
> -			list_for_each_entry(page, &h->hugepage_activelist, lru)
> -				hugetlb_cgroup_move_parent(idx, cgroup, page);
>  
> -			spin_unlock(&hugetlb_lock);
> -			idx++;
> -		}
> -		cond_resched();
> -	} while (hugetlb_cgroup_have_usage(cgroup));
> +	if (cgroup_task_count(cgroup) ||
> +	    !list_empty(&cgroup->children)) {
> +		ret = -EBUSY;
> +		goto out;
> +	}
> +
> +	for_each_hstate(h) {
> +		/*
> +		 * if we don't have any charge, skip this hstate
> +		 */
> +		idx = hstate_index(h);
> +		if (res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE) == 0)
> +			continue;
> +		spin_lock(&hugetlb_lock);
> +		list_for_each_entry(page, &h->hugepage_activelist, lru)
> +			hugetlb_cgroup_move_parent(idx, cgroup, page);
> +		spin_unlock(&hugetlb_lock);
> +		VM_BUG_ON(res_counter_read_u64(&h_cg->hugepage[idx], RES_USAGE));
> +	}
>  out:
>  	return ret;
>  }
> _

---
>From 621ed1c9dab63bd82205bd5266eb9974f86a0a3f Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@xxxxxxx>
Date: Thu, 19 Jul 2012 13:23:23 +0200
Subject: [PATCH] cgroup: keep cgroup_mutex locked for pre_destroy

3fa59dfb (cgroup: fix potential deadlock in pre_destroy) dropped the
cgroup_mutex lock while calling pre_destroy callbacks because memory
controller could deadlock because force_empty triggered reclaim.
Since "memcg: move charges to root cgroup if use_hierarchy=0" there is
no reclaim going on from mem_cgroup_force_empty though so we can safely
keep the cgroup_mutex locked. This has an advantage that no tasks might
be added during pre_destroy callback and so the handlers don't have to
consider races when new tasks add new charges. This simplifies the
implementation.
---
 kernel/cgroup.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0f3527d..9dba05d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4181,7 +4181,6 @@ again:
 		mutex_unlock(&cgroup_mutex);
 		return -EBUSY;
 	}
-	mutex_unlock(&cgroup_mutex);
 
 	/*
 	 * In general, subsystem has no css->refcnt after pre_destroy(). But
@@ -4204,7 +4203,6 @@ again:
 		return ret;
 	}
 
-	mutex_lock(&cgroup_mutex);
 	parent = cgrp->parent;
 	if (atomic_read(&cgrp->count) || !list_empty(&cgrp->children)) {
 		clear_bit(CGRP_WAIT_ON_RMDIR, &cgrp->flags);
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe cgroups" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html