On Mon, Nov 29, 2010 at 11:21 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > On Mon, 29 Nov 2010 22:49:43 -0800 > Ying Han <yinghan@xxxxxxxxxx> wrote: > >> The per cgroup kswapd is invoked at mem_cgroup_charge when the cgroup's memory >> usage above a threshold--low_wmark. Then the kswapd thread starts to reclaim >> pages in a priority loop similar to global algorithm. The kswapd is done if the >> memory usage below a threshold--high_wmark. >> >> The per cgroup background reclaim is based on the per cgroup LRU and also adds >> per cgroup watermarks. There are two watermarks including "low_wmark" and >> "high_wmark", and they are calculated based on the limit_in_bytes(hard_limit) >> for each cgroup. Each time the hard_limit is change, the corresponding wmarks >> are re-calculated. Since memory controller charges only user pages, there is >> no need for a "min_wmark". The current calculation of wmarks is a function of >> "memory.min_free_kbytes" which could be adjusted by writing different values >> into the new api. This is added mainly for debugging purpose. >> >> Signed-off-by: Ying Han <yinghan@xxxxxxxxxx> > > A few points. > > 1. I can understand the motivation for including low/high watermark to > res_coutner. But, sadly, compareing all charge will make the counter slow. > IMHO, as memory controller threshold-check or soft limit, checking usage > periodically based on event counter is enough. It will be low cost. If we have other limits using the event counter, this sounds a feasible try for the wmarks. I can look into that. > > 2. min_free_kbytes must be automatically calculated. > For example, max(3% of limit, 20MB) or some. Now the wmark is automatically calculated based on the limit. Adding the min_free_kbytes gives us more flexibility to adjust the portion of the threshold. This could just be a performance tuning parameter later. I need it now at least at the beginning before figuring out a reasonable calculation formula. > > 3. When you allow min_free_kbytes to be set by users, please compare > it with the limit. > I think min_free_kbyte interface itself should be in another patch... > interface code tends to make patch bigger. Sounds feasible. --Ying > > >> --- >> include/linux/memcontrol.h | 1 + >> include/linux/res_counter.h | 88 ++++++++++++++++++++++++++++++- >> kernel/res_counter.c | 26 ++++++++-- >> mm/memcontrol.c | 123 +++++++++++++++++++++++++++++++++++++++++-- >> mm/vmscan.c | 10 ++++ >> 5 files changed, 238 insertions(+), 10 deletions(-) >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 159a076..90fe7fe 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -76,6 +76,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem); >> >> extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page); >> extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); >> +extern int mem_cgroup_watermark_ok(struct mem_cgroup *mem, int charge_flags); >> >> static inline >> int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup) >> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h >> index fcb9884..eed12c5 100644 >> --- a/include/linux/res_counter.h >> +++ b/include/linux/res_counter.h >> @@ -39,6 +39,16 @@ struct res_counter { >> */ >> unsigned long long soft_limit; >> /* >> + * the limit that reclaim triggers. TODO: res_counter in mem >> + * or wmark_limit. >> + */ >> + unsigned long long low_wmark_limit; >> + /* >> + * the limit that reclaim stops. TODO: res_counter in mem or >> + * wmark_limit. >> + */ >> + unsigned long long high_wmark_limit; >> + /* >> * the number of unsuccessful attempts to consume the resource >> */ >> unsigned long long failcnt; >> @@ -55,6 +65,10 @@ struct res_counter { >> >> #define RESOURCE_MAX (unsigned long long)LLONG_MAX >> >> +#define CHARGE_WMARK_MIN 0x01 >> +#define CHARGE_WMARK_LOW 0x02 >> +#define CHARGE_WMARK_HIGH 0x04 >> + >> /** >> * Helpers to interact with userspace >> * res_counter_read_u64() - returns the value of the specified member