On Sun, 04 Mar 2012 23:37:22 +0530 "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> wrote: > On Fri, 2 Mar 2012 17:38:16 +0900, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > > On Thu, 1 Mar 2012 14:46:15 +0530 > > "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> wrote: > > > > > From: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx> > > > > > + help > > > + Add non reclaim resource management to memory resource controller. > > > + Currently only HugeTLB pages will be managed using this extension. > > > + The controller limit is enforced during mmap(2), so that > > > + application can fall back to allocations using smaller page size > > > + if the memory controller limit prevented them from allocating HugeTLB > > > + pages. > > > + > > > > Hm. In other thread, KMEM accounting is discussed. There is 2 proposals and > > - 1st is accounting only reclaimable slabs (as dcache etc.) > > - 2nd is accounting all slab allocations. > > > > Here, 2nd one includes NORECLAIM kmem cache. (Discussion is not ended.) > > > > So, for your developments, How about MEM_RES_CTLR_HUGEPAGE ? > > Frankly I didn't like the noreclaim name, I also didn't want to indicate > HUGEPAGE, because the code doesn't make any huge page assumption. You can add this config for HUGEPAGE interfaces. Later we can sort out other configs. > > > > > > > config CGROUP_MEM_RES_CTLR_SWAP > > > bool "Memory Resource Controller Swap Extension" > > > depends on CGROUP_MEM_RES_CTLR && SWAP > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 6728a7a..b00d028 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -49,6 +49,7 @@ > > > #include <linux/page_cgroup.h> > > > #include <linux/cpu.h> > > > #include <linux/oom.h> > > > +#include <linux/region.h> > > > #include "internal.h" > > > #include <net/sock.h> > > > #include <net/tcp_memcontrol.h> > > > @@ -214,6 +215,11 @@ static void mem_cgroup_threshold(struct mem_cgroup *memcg); > > > static void mem_cgroup_oom_notify(struct mem_cgroup *memcg); > > > > > > /* > > > + * Currently only hugetlbfs pages are tracked using no reclaim > > > + * resource count. So we need only MAX_HSTATE res counter > > > + */ > > > +#define MEMCG_MAX_NORECLAIM HUGE_MAX_HSTATE > > > +/* > > > * The memory controller data structure. The memory controller controls both > > > * page cache and RSS per cgroup. We would eventually like to provide > > > * statistics based on the statistics developed by Rik Van Riel for clock-pro, > > > @@ -235,6 +241,11 @@ struct mem_cgroup { > > > */ > > > struct res_counter memsw; > > > /* > > > + * the counter to account for non reclaim resources > > > + * like hugetlb pages > > > + */ > > > + struct res_counter no_rcl_res[MEMCG_MAX_NORECLAIM]; > > > > struct res_counter hugepages; > > > > will be ok. > > > > My goal was to make this patch not to mention hugepages, because > it doesn't really have any depedency on hugepages. That is one of the reason > for adding MEMCG_MAX_NORECLAIM. Later if we want other in memory file system > (shmemfs) to limit the resource usage in a similar fashion, we should be > able to use this memcg changes. > > May be for this patchset I can make the changes you suggested and later > when we want to reuse the code make it more generic ? > yes. If there is no user interface change, internal code change will be welcomed. > > > > > > + /* > > > * Per cgroup active and inactive list, similar to the > > > * per zone LRU lists. > > > */ > > > @@ -4887,6 +4898,7 @@ err_cleanup: > > > static struct cgroup_subsys_state * __ref > > > mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > > > { > > > + int idx; > > > struct mem_cgroup *memcg, *parent; > > > long error = -ENOMEM; > > > int node; > > > @@ -4922,6 +4934,10 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > > > if (parent && parent->use_hierarchy) { > > > res_counter_init(&memcg->res, &parent->res); > > > res_counter_init(&memcg->memsw, &parent->memsw); > > > + for (idx = 0; idx < MEMCG_MAX_NORECLAIM; idx++) { > > > + res_counter_init(&memcg->no_rcl_res[idx], > > > + &parent->no_rcl_res[idx]); > > > + } > > > > You can remove this kinds of loop and keep your implemenation simple. > > > Can you explain this ? How can we remote the loop ?. We want to track > each huge page size as a seperate resource. > Ah, sorry. I miseed it. please ignore. > > > +long mem_cgroup_try_noreclaim_charge(struct list_head *chg_list, > > > + unsigned long from, unsigned long to, > > > + int idx) > > > +{ > > > + long chg; > > > + int ret = 0; > > > + unsigned long csize; > > > + struct mem_cgroup *memcg; > > > + struct res_counter *fail_res; > > > + > > > + /* > > > + * Get the task cgroup within rcu_readlock and also > > > + * get cgroup reference to make sure cgroup destroy won't > > > + * race with page_charge. We don't allow a cgroup destroy > > > + * when the cgroup have some charge against it > > > + */ > > > + rcu_read_lock(); > > > + memcg = mem_cgroup_from_task(current); > > > + css_get(&memcg->css); > > > > css_tryget() ? > > > > > Why ? > current<->cgroup relationship isn't under any locks. So, we do speculative access with rcu_read_lock() and css_tryget(). Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>