Re: [External] Re: [PATCH v2] mm: add new syscall pidfd_set_mempolicy().

Michal Hocko <mhocko@xxxxxxxx> · Wed, 16 Nov 2022 15:57:14 +0100

On Wed 16-11-22 19:28:10, Zhongkun He wrote:
> Hi Michal, I've done the performance testing, please check it out.
> 
> > > Yes this is all understood but the level of the overhead is not really
> > > clear. So the question is whether this will induce a visible overhead.
> > > Because from the maintainability point of view it is much less costly to
> > > have a clear life time model. Right now we have a mix of reference
> > > counting and per-task requirements which is rather subtle and easy to
> > > get wrong. In an ideal world we would have get_vma_policy always
> > > returning a reference counted policy or NULL. If we really need to
> > > optimize for cache line bouncing we can go with per cpu reference
> > > counters (something that was not available at the time the mempolicy
> > > code has been introduced).
> > > 
> > > So I am not saying that the task_work based solution is not possible I
> > > just think that this looks like a good opportunity to get from the
> > > existing subtle model.
> 
> Test tools:
> numactl -m 0-3 ./run-mmtests.sh -n -c configs/config-workload-
> aim9-pagealloc  test_name
> 
> Modification:
> Get_vma_policy(), get_task_policy() always returning a reference
> counted policy, except for the static policy(default_policy and
> preferred_node_policy[nid]).

It would be better to add the patch that has been tested.

> All vma manipulation is protected by a down_read, so mpol_get()
> can be called directly to take a refcount on the mpol. but there
> is no lock in task->mempolicy context.
> so task->mempolicy should be protected by task_lock.
> 
> struct mempolicy *get_task_policy(struct task_struct *p)
> {
> 	struct mempolicy *pol;
> 	int node;
> 
> 	if (p->mempolicy) {
> 		task_lock(p);
> 		pol = p->mempolicy;
> 		mpol_get(pol);
> 		task_unlock(p);
> 		if (pol)
> 			return pol;
> 	}

One way to deal with that would be to use a similar model as css_tryget

Btw. have you tried to profile those slowdowns to identify hotspots?

Thanks
-- 
Michal Hocko
SUSE Labs