On Wed 16-11-22 19:28:10, Zhongkun He wrote: > Hi Michal, I've done the performance testing, please check it out. > > > > Yes this is all understood but the level of the overhead is not really > > > clear. So the question is whether this will induce a visible overhead. > > > Because from the maintainability point of view it is much less costly to > > > have a clear life time model. Right now we have a mix of reference > > > counting and per-task requirements which is rather subtle and easy to > > > get wrong. In an ideal world we would have get_vma_policy always > > > returning a reference counted policy or NULL. If we really need to > > > optimize for cache line bouncing we can go with per cpu reference > > > counters (something that was not available at the time the mempolicy > > > code has been introduced). > > > > > > So I am not saying that the task_work based solution is not possible I > > > just think that this looks like a good opportunity to get from the > > > existing subtle model. > > Test tools: > numactl -m 0-3 ./run-mmtests.sh -n -c configs/config-workload- > aim9-pagealloc test_name > > Modification: > Get_vma_policy(), get_task_policy() always returning a reference > counted policy, except for the static policy(default_policy and > preferred_node_policy[nid]). It would be better to add the patch that has been tested. > All vma manipulation is protected by a down_read, so mpol_get() > can be called directly to take a refcount on the mpol. but there > is no lock in task->mempolicy context. > so task->mempolicy should be protected by task_lock. > > struct mempolicy *get_task_policy(struct task_struct *p) > { > struct mempolicy *pol; > int node; > > if (p->mempolicy) { > task_lock(p); > pol = p->mempolicy; > mpol_get(pol); > task_unlock(p); > if (pol) > return pol; > } One way to deal with that would be to use a similar model as css_tryget Btw. have you tried to profile those slowdowns to identify hotspots? Thanks -- Michal Hocko SUSE Labs