On Thu, Jan 13, 2011 at 4:11 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote: > On Thu, 13 Jan 2011 14:00:32 -0800 > Ying Han <yinghan@xxxxxxxxxx> wrote: > >> The per cgroup kswapd is invoked when the cgroup's free memory (limit - usage) >> is less than a threshold--low_wmark. Then the kswapd thread starts to reclaim >> pages in a priority loop similar to global algorithm. The kswapd is done if the >> free memory is above a threshold--high_wmark. >> >> The per cgroup background reclaim is based on the per cgroup LRU and also adds >> per cgroup watermarks. There are two watermarks including "low_wmark" and >> "high_wmark", and they are calculated based on the limit_in_bytes(hard_limit) >> for each cgroup. Each time the hard_limit is changed, the corresponding wmarks >> are re-calculated. Since memory controller charges only user pages, there is >> no need for a "min_wmark". The current calculation of wmarks is a function of >> "memory.min_free_kbytes" which could be adjusted by writing different values >> into the new api. This is added mainly for debugging purpose. >> >> Change log v2...v1: >> 1. Remove the res_counter_charge on wmark due to performance concern. >> 2. Move the new APIs min_free_kbytes, reclaim_wmarks into seperate commit. >> 3. Calculate the min_free_kbytes automatically based on the limit_in_bytes. >> 4. make the wmark to be consistant with core VM which checks the free pages >> instead of usage. >> 5. changed wmark to be boolean >> >> Signed-off-by: Ying Han <yinghan@xxxxxxxxxx> > Thank you KAMEZAWA for your comments. > Hmm, I don't think using the same algorithm as min_free_kbytes is good. > > Why it's bad to have 2 interfaces as low_wmark and high_wmark ? > > And in this patch, min_free_kbytes can be [256...65536]...I think this > '256' is not good because it should be able to be set to '0'. > > IIUC, in enterprise systems, there are users who want to keep a fixed amount > of free memory always. This interface will not allow such use case. > > I think we should have 2 interfaces as low_wmark and high_wmark. But as default > value, the same value as to the alogorithm with min_free_kbytes will make sense. I agree that "min_free_kbytes" concept doesn't apply well since there is no notion of "reserved pool" in memcg. I borrowed it at the beginning is to add a tunable to the per-memcg watermarks besides the hard_limit. I read the patch posted from Satoru Moriya "Tunable watermarks", and introducing the per-memcg-per-watermark tunable sounds good to me. Might consider adding it to the next post. > > BTW, please divide res_counter part and memcg part in the next post. Will do. > > Please explain your handling of 'hierarchy' in description. I haven't thought through the 'hierarchy' handling in this patchset which I will probably put more thoughts in the following posts. Do you have recommendations on handing the 'hierarchy' ? --Ying > > Thanks, > -Kame > > >> --- >> include/linux/memcontrol.h | 1 + >> include/linux/res_counter.h | 83 +++++++++++++++++++++++++++++++++++++++++++ >> kernel/res_counter.c | 6 +++ >> mm/memcontrol.c | 73 +++++++++++++++++++++++++++++++++++++ >> 4 files changed, 163 insertions(+), 0 deletions(-) >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 3433784..80a605f 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -93,6 +93,7 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem); >> >> extern struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page); >> extern struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); >> +extern int mem_cgroup_watermark_ok(struct mem_cgroup *mem, int charge_flags); >> >> static inline >> int mm_match_cgroup(const struct mm_struct *mm, const struct mem_cgroup *cgroup) >> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h >> index fcb9884..10b7e59 100644 >> --- a/include/linux/res_counter.h >> +++ b/include/linux/res_counter.h >> @@ -39,6 +39,15 @@ struct res_counter { >> */ >> unsigned long long soft_limit; >> /* >> + * the limit that reclaim triggers. it is the free count >> + * (limit - usage) >> + */ >> + unsigned long long low_wmark_limit; >> + /* >> + * the limit that reclaim stops. it is the free count >> + */ >> + unsigned long long high_wmark_limit; >> + /* >> * the number of unsuccessful attempts to consume the resource >> */ >> unsigned long long failcnt; >> @@ -55,6 +64,9 @@ struct res_counter { >> >> #define RESOURCE_MAX (unsigned long long)LLONG_MAX >> >> +#define CHARGE_WMARK_LOW 0x02 >> +#define CHARGE_WMARK_HIGH 0x04 >> + >> /** >> * Helpers to interact with userspace >> * res_counter_read_u64() - returns the value of the specified member