On Wed, 22 Jun 2011 17:53:09 +0200 Michal Hocko <mhocko@xxxxxxx> wrote: > On Thu 16-06-11 12:54:00, KAMEZAWA Hiroyuki wrote: > > From 88090fe10e225ad8769ba0ea01692b7314e8b973 Mon Sep 17 00:00:00 2001 > > From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > > Date: Wed, 15 Jun 2011 16:19:46 +0900 > > Subject: [PATCH 4/7] memcg: update numa information based on event counter > > > > commit 889976 adds an numa node round-robin for memcg. But the information > > is updated once per 10sec. > > > > This patch changes the update trigger from jiffies to memcg's event count. > > After this patch, numa scan information will be updated when > > > > - the number of pagein/out events is larger than 3% of limit > > or > > - the number of pagein/out events is larger than 16k > > (==64MB pagein/pageout if pagesize==4k.) > > > > The counter of mem->numascan_update the sum of percpu events counter. > > When a task hits limit, it checks mem->numascan_update. If it's over > > min(3% of limit, 16k), numa information will be updated. > > Yes, I like the event based approach more than the origin (time) based > one. > > > > > This patch also adds mutex for updating information. This will allow us > > to avoid unnecessary scan. > > > > Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> > > --- > > mm/memcontrol.c | 51 +++++++++++++++++++++++++++++++++++++++++++++------ > > 1 file changed, 45 insertions(+), 6 deletions(-) > > > > Index: mmotm-0615/mm/memcontrol.c > > =================================================================== > > --- mmotm-0615.orig/mm/memcontrol.c > > +++ mmotm-0615/mm/memcontrol.c > > @@ -108,10 +108,12 @@ enum mem_cgroup_events_index { > > enum mem_cgroup_events_target { > > MEM_CGROUP_TARGET_THRESH, > > MEM_CGROUP_TARGET_SOFTLIMIT, > > + MEM_CGROUP_TARGET_NUMASCAN, > > Shouldn't it be defined only for MAX_NUMNODES > 1 > Hmm, yes. But I want to reduce #ifdefs.. > > MEM_CGROUP_NTARGETS, > > }; > > #define THRESHOLDS_EVENTS_TARGET (128) > > #define SOFTLIMIT_EVENTS_TARGET (1024) > > +#define NUMASCAN_EVENTS_TARGET (1024) > > > > struct mem_cgroup_stat_cpu { > > long count[MEM_CGROUP_STAT_NSTATS]; > > @@ -288,8 +290,9 @@ struct mem_cgroup { > > int last_scanned_node; > > #if MAX_NUMNODES > 1 > > nodemask_t scan_nodes; > > - unsigned long next_scan_node_update; > > + struct mutex numascan_mutex; > > #endif > > + atomic_t numascan_update; > > Why it is out of ifdef? > This was for avoiding #ifdef in mem_cgroup_create()...but it's not used now. I'll fix this. > > /* > > * Should the accounting and control be hierarchical, per subtree? > > */ > > @@ -741,6 +744,9 @@ static void __mem_cgroup_target_update(s > > case MEM_CGROUP_TARGET_SOFTLIMIT: > > next = val + SOFTLIMIT_EVENTS_TARGET; > > break; > > + case MEM_CGROUP_TARGET_NUMASCAN: > > + next = val + NUMASCAN_EVENTS_TARGET; > > + break; > > MAX_NUMNODES > 1 > > > default: > > return; > > } > > @@ -764,6 +770,13 @@ static void memcg_check_events(struct me > > __mem_cgroup_target_update(mem, > > MEM_CGROUP_TARGET_SOFTLIMIT); > > } > > + if (unlikely(__memcg_event_check(mem, > > + MEM_CGROUP_TARGET_NUMASCAN))) { > > + atomic_add(MEM_CGROUP_TARGET_NUMASCAN, > > + &mem->numascan_update); > > + __mem_cgroup_target_update(mem, > > + MEM_CGROUP_TARGET_NUMASCAN); > > + } > > } > > again MAX_NUMNODES > 1 > Hmm, ok, I will add #ifdef only here. > > } > > > > @@ -1616,17 +1629,32 @@ mem_cgroup_select_victim(struct mem_cgro > > /* > > * Always updating the nodemask is not very good - even if we have an empty > > * list or the wrong list here, we can start from some node and traverse all > > - * nodes based on the zonelist. So update the list loosely once per 10 secs. > > + * nodes based on the zonelist. > > * > > + * The counter of mem->numascan_update is updated once per > > + * NUMASCAN_EVENTS_TARGET. We update the numa information when we see > > + * the number of event is larger than 3% of limit or 64MB pagein/pageout. > > */ > > +#define NUMASCAN_UPDATE_RATIO (3) > > +#define NUMASCAN_UPDATE_THRESH (16384UL) /* 16k events of pagein/pageout */ > > static void mem_cgroup_may_update_nodemask(struct mem_cgroup *mem) > > { > > int nid; > > - > > - if (time_after(mem->next_scan_node_update, jiffies)) > > + unsigned long long limit; > > + /* if no limit, we never reach here */ > > + limit = res_counter_read_u64(&mem->res, RES_LIMIT); > > + limit /= PAGE_SIZE; > > + /* 3% of limit */ > > + limit = (limit * NUMASCAN_UPDATE_RATIO/100UL); > > + limit = min_t(unsigned long long, limit, NUMASCAN_UPDATE_THRESH); > > + /* > > + * If the number of pagein/out event is larger than 3% of limit or > > + * 64MB pagein/out, refresh numa information. > > + */ > > + if (atomic_read(&mem->numascan_update) < limit || > > + !mutex_trylock(&mem->numascan_mutex)) > > return; > > I am not sure whether a mutex is not overkill here. What about using an > atomic operation instead? > I think mutex is informative than atomic counter for code readers. If influence of overhead is not big, I'd like to use mutex. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>