On Mon, Apr 25, 2011 at 2:29 AM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > Add memory.high_wmark_distance and reclaim_wmarks API per memcg. > The first adjust the internal low/high wmark calculation and > the reclaim_wmarks exports the current value of watermarks. > low_wmark is caclurated in automatic. > > $ echo 500m >/dev/cgroup/A/memory.limit_in_bytes > $ cat /dev/cgroup/A/memory.limit_in_bytes > 524288000 > > $ echo 50m >/dev/cgroup/A/memory.high_wmark_distance > > $ cat /dev/cgroup/A/memory.reclaim_wmarks > low_wmark 476053504 > high_wmark 471859200 > > Change v8a..v7 > 1. removed low_wmark_distance it's now automatic. > 2. added Documenation. > > Signed-off-by: Ying Han <yinghan@xxxxxxxxxx> > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > --- > Documentation/cgroups/memory.txt | 43 ++++++++++++++++++++++++++++ > mm/memcontrol.c | 58 +++++++++++++++++++++++++++++++++++++++ > 2 files changed, 100 insertions(+), 1 deletion(-) > > Index: memcg/mm/memcontrol.c > =================================================================== > --- memcg.orig/mm/memcontrol.c > +++ memcg/mm/memcontrol.c > @@ -4074,6 +4074,40 @@ static int mem_cgroup_swappiness_write(s > return 0; > } > > +static u64 mem_cgroup_high_wmark_distance_read(struct cgroup *cgrp, > + struct cftype *cft) > +{ > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp); > + > + return memcg->high_wmark_distance; > +} > + > +static int mem_cgroup_high_wmark_distance_write(struct cgroup *cont, > + struct cftype *cft, > + const char *buffer) > +{ > + struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); > + unsigned long long val; > + u64 limit; > + int ret; > + > + if (!cont->parent) > + return -EINVAL; > + > + ret = res_counter_memparse_write_strategy(buffer, &val); > + if (ret) > + return -EINVAL; > + > + limit = res_counter_read_u64(&memcg->res, RES_LIMIT); > + if (val >= limit) > + return -EINVAL; > + > + memcg->high_wmark_distance = val; > + > + setup_per_memcg_wmarks(memcg); > + return 0; > +} > + > static void __mem_cgroup_threshold(struct mem_cgroup *memcg, bool swap) > { > struct mem_cgroup_threshold_ary *t; > @@ -4365,6 +4399,21 @@ static void mem_cgroup_oom_unregister_ev > mutex_unlock(&memcg_oom_mutex); > } > > +static int mem_cgroup_wmark_read(struct cgroup *cgrp, > + struct cftype *cft, struct cgroup_map_cb *cb) > +{ > + struct mem_cgroup *mem = mem_cgroup_from_cont(cgrp); > + u64 low_wmark, high_wmark; > + > + low_wmark = res_counter_read_u64(&mem->res, RES_LOW_WMARK_LIMIT); > + high_wmark = res_counter_read_u64(&mem->res, RES_HIGH_WMARK_LIMIT); > + > + cb->fill(cb, "low_wmark", low_wmark); > + cb->fill(cb, "high_wmark", high_wmark); > + > + return 0; > +} > + > static int mem_cgroup_oom_control_read(struct cgroup *cgrp, > struct cftype *cft, struct cgroup_map_cb *cb) > { > @@ -4468,6 +4517,15 @@ static struct cftype mem_cgroup_files[] > .unregister_event = mem_cgroup_oom_unregister_event, > .private = MEMFILE_PRIVATE(_OOM_TYPE, OOM_CONTROL), > }, > + { > + .name = "high_wmark_distance", > + .write_string = mem_cgroup_high_wmark_distance_write, > + .read_u64 = mem_cgroup_high_wmark_distance_read, > + }, > + { > + .name = "reclaim_wmarks", > + .read_map = mem_cgroup_wmark_read, > + }, > }; > > #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > Index: memcg/Documentation/cgroups/memory.txt > =================================================================== > --- memcg.orig/Documentation/cgroups/memory.txt > +++ memcg/Documentation/cgroups/memory.txt > @@ -68,6 +68,8 @@ Brief summary of control files. > (See sysctl's vm.swappiness) > memory.move_charge_at_immigrate # set/show controls of moving charges > memory.oom_control # set/show oom controls. > + memory.hiwmark_distance # set/show watermark control > + memory.reclaim_wmarks # show watermark details. > > 1. History > > @@ -501,6 +503,7 @@ NOTE2: When panic_on_oom is set to "2", > case of an OOM event in any cgroup. > > 7. Soft limits > +(See Watermarks, too.) > > Soft limits allow for greater sharing of memory. The idea behind soft limits > is to allow control groups to use as much of the memory as needed, provided > @@ -649,7 +652,45 @@ At reading, current status of OOM is sho > under_oom 0 or 1 (if 1, the memory cgroup is under OOM, tasks may > be stopped.) > > -11. TODO > +11. Watermarks > + > +Tasks gets big overhead when it hits memory limit because it needs to scan > +memory and free them. To avoid that, some background memory freeing by > +kernel will be helpful. Memory cgroup supports background memory freeing > +by threshold called Watermarks. It can be used for fuzzy limiting of memory. > + > +For example, if you have 1G limit and set > + - high_watermark ....980M > + - low_watermark ....984M > +Memory freeing work by kernel starts when usage goes over 984M until memory > +usage goes down to 980M. Of course, this cousumes CPU. So, the kernel controls > +this work to avoid too much cpu hogging. > + > +11.1 memory.high_wmark_distance > + > +This is an interface for high_wmark. You can specify the distance between > +the limit of memory and high_watemark here. For example, under 1G limit memroy > +cgroup, > + # echo 20M > memory.high_wmark_distance > +will set high_watermark as 980M. low_watermark is _automatically_ determined > +because big distance between high-low watermark tend to use too much CPU and > +it's difficult to determine low_watermark by users. > + > +With this, memory usage will be reduced to 980M as time goes by. > +After setting memory.high_wmark_distance to be 20M, assume you update > +memory.limit_in_bytes to be 2G bytes. In this case, hiwh_watermak is 1980M. > + > +Another thinking, assume you have memory.limit_in_bytes to be 1G. > +Then, set memory.high_wmark_distance as 300M. Then, you can limit memory > +usage under 700M in moderate way and you can limit it under 1G with hard > +limit. > + > +11.2 memory.reclaim_wmarks > + > +This interface shows high_watermark and low_watermark in bytes. Maybe > +useful at compareing usage/watermarks. > + > +12. TODO > > 1. Add support for accounting huge pages (as a separate controller) > 2. Make per-cgroup scanner reclaim not-shared pages first > Thank you and this looks good me, and I can certainly apply that on the next post. --Ying -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href