Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michal Hocko <mhocko@xxxxxxx> writes:

> On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
>> From: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
>> 
>> This patch implements a memcg extension that allows us to control
>> HugeTLB allocations via memory controller.
>
> And the infrastructure is not used at this stage (you forgot to
> mention).
> The changelog should be much more descriptive.


Will update the changelog.

>
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
>> ---
>>  include/linux/hugetlb.h    |    1 +
>>  include/linux/memcontrol.h |   42 +++++++++++++
>>  init/Kconfig               |    8 +++
>>  mm/hugetlb.c               |    2 +-
>>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
>>  5 files changed, 190 insertions(+), 1 deletions(-)
>> 
> [...]
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 3f42cd6..f0eb8aa 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -725,6 +725,14 @@ config CGROUP_PERF
>>  
>>  	  Say N if unsure.
>>  
>> +config MEM_RES_CTLR_HUGETLB
>> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
>> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
>> +	default n
>> +	help
>> +	  Add HugeTLB management to memory resource controller. When you
>> +	  enable this, you can put a per cgroup limit on HugeTLB usage.
>
> How does it interact with the hard/soft limists etc...


There is no softlimit support for HugeTLB extension.

>
> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 6728a7a..4b36c5e 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -235,6 +235,10 @@ struct mem_cgroup {
>>  	 */
>>  	struct res_counter memsw;
>>  	/*
>> +	 * the counter to account for hugepages from hugetlb.
>> +	 */
>> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
>> +	/*
>>  	 * Per cgroup active and inactive list, similar to the
>>  	 * per zone LRU lists.
>>  	 */
>> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
>>  }
>>  #endif
>>  
>> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
>> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
>> +{
>> +	int idx;
>> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
>
> Maybe we should expose for_each_hstate as well...


That will not really help here. If we use for_each_hstate then we will
need to use hstate_index to get the index.

>
>> +		if (memcg->hugepage[idx].usage > 0)
>> +			return 1;
>> +	}
>> +	return 0;
>> +}
>> +
>> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
>> +				   struct mem_cgroup **ptr)
>> +{
>> +	int ret = 0;
>> +	struct mem_cgroup *memcg;
>> +	struct res_counter *fail_res;
>> +	unsigned long csize = nr_pages * PAGE_SIZE;
>> +
>> +	if (mem_cgroup_disabled())
>> +		return 0;
>> +again:
>> +	rcu_read_lock();
>> +	memcg = mem_cgroup_from_task(current);
>> +	if (!memcg)
>> +		memcg = root_mem_cgroup;
>> +	if (mem_cgroup_is_root(memcg)) {
>> +		rcu_read_unlock();
>> +		goto done;
>> +	}
>> +	if (!css_tryget(&memcg->css)) {
>> +		rcu_read_unlock();
>> +		goto again;
>> +	}
>> +	rcu_read_unlock();
>> +
>> +	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
>> +	css_put(&memcg->css);
>> +done:
>> +	*ptr = memcg;
>
> Why do we set ptr even for the failure case after we dropped a
> reference?


That ensures that *ptr is NULL. 

>
>> +	return ret;
>> +}
>> +
>> +void mem_cgroup_hugetlb_commit_charge(int idx, unsigned long nr_pages,
>> +				      struct mem_cgroup *memcg,
>> +				      struct page *page)
>> +{
>> +	struct page_cgroup *pc;
>> +
>> +	if (mem_cgroup_disabled())
>> +		return;
>> +
>> +	pc = lookup_page_cgroup(page);
>> +	lock_page_cgroup(pc);
>> +	if (unlikely(PageCgroupUsed(pc))) {
>> +		unlock_page_cgroup(pc);
>> +		mem_cgroup_hugetlb_uncharge_memcg(idx, nr_pages, memcg);
>> +		return;
>> +	}
>> +	pc->mem_cgroup = memcg;
>> +	/*
>> +	 * We access a page_cgroup asynchronously without lock_page_cgroup().
>> +	 * Especially when a page_cgroup is taken from a page, pc->mem_cgroup
>> +	 * is accessed after testing USED bit. To make pc->mem_cgroup visible
>> +	 * before USED bit, we need memory barrier here.
>> +	 * See mem_cgroup_add_lru_list(), etc.
>> +	 */
>> +	smp_wmb();
>
> Is this really necessary for hugetlb pages as well?

I used to do that in cgroup_rmdir path, I later changed that part of the code. I
will look at the patches again to see if we really need this.


>
>> +	SetPageCgroupUsed(pc);
>> +
>> +	unlock_page_cgroup(pc);
>> +	return;
>> +}
>> +
> [...]
>> @@ -4887,6 +5013,7 @@ err_cleanup:
>>  static struct cgroup_subsys_state * __ref
>>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  {
>> +	int idx;
>>  	struct mem_cgroup *memcg, *parent;
>>  	long error = -ENOMEM;
>>  	int node;
>> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>>  		 * mem_cgroup(see mem_cgroup_put).
>>  		 */
>>  		mem_cgroup_get(parent);
>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>
> Do we have to init all hstates or is hugetlb_max_hstate enough?


Yes. we do call mem_cgroup_create for root cgroup before initialzing
hugetlb hstate.

>
>> +			res_counter_init(&memcg->hugepage[idx],
>> +					 &parent->hugepage[idx]);
>>  	} else {
>>  		res_counter_init(&memcg->res, NULL);
>>  		res_counter_init(&memcg->memsw, NULL);
>> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
>> +			res_counter_init(&memcg->hugepage[idx], NULL);
>
> Same here
> -- 

-aneesh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]