Re: [PATCH -V2 4/9] memcg: Add non reclaim resource tracking to memcg

KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> · Thu, 8 Mar 2012 14:56:28 +0900

On Sun, 04 Mar 2012 23:37:22 +0530
"Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> wrote:

> On Fri, 2 Mar 2012 17:38:16 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
> > On Thu,  1 Mar 2012 14:46:15 +0530
> > "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> wrote:
> > 
> > > From: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx>

> > 
> > > +	help
> > > +	  Add non reclaim resource management to memory resource controller.
> > > +	  Currently only HugeTLB pages will be managed using this extension.
> > > +	  The controller limit is enforced during mmap(2), so that
> > > +	  application can fall back to allocations using smaller page size
> > > +	  if the memory controller limit prevented them from allocating HugeTLB
> > > +	  pages.
> > > +
> > 
> > Hm. In other thread, KMEM accounting is discussed. There is 2 proposals and
> >  - 1st is accounting only reclaimable slabs (as dcache etc.)
> >  - 2nd is accounting all slab allocations.
> > 
> > Here, 2nd one includes NORECLAIM kmem cache. (Discussion is not ended.)
> > 
> > So, for your developments,  How about MEM_RES_CTLR_HUGEPAGE ?
> 
> Frankly I didn't like the noreclaim name, I also didn't want to indicate
> HUGEPAGE, because the code doesn't make any huge page assumption.

You can add this config for HUGEPAGE interfaces.
Later we can sort out other configs.

> > 
> > 
> > >  config CGROUP_MEM_RES_CTLR_SWAP
> > >  	bool "Memory Resource Controller Swap Extension"
> > >  	depends on CGROUP_MEM_RES_CTLR && SWAP
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 6728a7a..b00d028 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -49,6 +49,7 @@
> > >  #include <linux/page_cgroup.h>
> > >  #include <linux/cpu.h>
> > >  #include <linux/oom.h>
> > > +#include <linux/region.h>
> > >  #include "internal.h"
> > >  #include <net/sock.h>
> > >  #include <net/tcp_memcontrol.h>
> > > @@ -214,6 +215,11 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg);
> > >  static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
> > >  
> > >  /*
> > > + * Currently only hugetlbfs pages are tracked using no reclaim
> > > + * resource count. So we need only MAX_HSTATE res counter
> > > + */
> > > +#define MEMCG_MAX_NORECLAIM HUGE_MAX_HSTATE
> > > +/*
> > >   * The memory controller data structure. The memory controller controls both
> > >   * page cache and RSS per cgroup. We would eventually like to provide
> > >   * statistics based on the statistics developed by Rik Van Riel for clock-pro,
> > > @@ -235,6 +241,11 @@ struct mem_cgroup {
> > >  	 */
> > >  	struct res_counter memsw;
> > >  	/*
> > > +	 * the counter to account for non reclaim resources
> > > +	 * like hugetlb pages
> > > +	 */
> > > +	struct res_counter no_rcl_res[MEMCG_MAX_NORECLAIM];
> > 
> > struct res_counter hugepages;
> > 
> > will be ok.
> > 
> 
> My goal was to make this patch not to mention hugepages, because
> it doesn't really have any depedency on hugepages. That is one of the reason
> for adding MEMCG_MAX_NORECLAIM. Later if we want other in memory file system
> (shmemfs) to limit the resource usage in a similar fashion, we should be
> able to use this memcg changes.
> 
> May be for this patchset I can make the changes you suggested and later
> when we want to reuse the code make it more generic ?
> 

yes. If there is no user interface change, internal code change will be welcomed.

> 
> > 
> > > +	/*
> > >  	 * Per cgroup active and inactive list, similar to the
> > >  	 * per zone LRU lists.
> > >  	 */
> > > @@ -4887,6 +4898,7 @@ err_cleanup:
> > >  static struct cgroup_subsys_state * __ref
> > >  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > >  {
> > > +	int idx;
> > >  	struct mem_cgroup *memcg, *parent;
> > >  	long error = -ENOMEM;
> > >  	int node;
> > > @@ -4922,6 +4934,10 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > >  	if (parent && parent->use_hierarchy) {
> > >  		res_counter_init(&memcg->res, &parent->res);
> > >  		res_counter_init(&memcg->memsw, &parent->memsw);
> > > +		for (idx = 0; idx < MEMCG_MAX_NORECLAIM; idx++) {
> > > +			res_counter_init(&memcg->no_rcl_res[idx],
> > > +					 &parent->no_rcl_res[idx]);
> > > +		}
> > 
> > You can remove this kinds of loop and keep your implemenation simple.
> 
> 
> Can you explain this ? How can we remote the loop ?. We want to track
> each huge page size as a seperate resource. 
> 
Ah, sorry. I miseed it. please ignore.

> > > +long mem_cgroup_try_noreclaim_charge(struct list_head *chg_list,
> > > +				     unsigned long from, unsigned long to,
> > > +				     int idx)
> > > +{
> > > +	long chg;
> > > +	int ret = 0;
> > > +	unsigned long csize;
> > > +	struct mem_cgroup *memcg;
> > > +	struct res_counter *fail_res;
> > > +
> > > +	/*
> > > +	 * Get the task cgroup within rcu_readlock and also
> > > +	 * get cgroup reference to make sure cgroup destroy won't
> > > +	 * race with page_charge. We don't allow a cgroup destroy
> > > +	 * when the cgroup have some charge against it
> > > +	 */
> > > +	rcu_read_lock();
> > > +	memcg = mem_cgroup_from_task(current);
> > > +	css_get(&memcg->css);
> > 
> > css_tryget() ?
> > 
> 
> 
> Why ?
> 

current<->cgroup relationship isn't under any locks. So, we do speculative
access with rcu_read_lock() and css_tryget().

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>