Re: [RFC PATCH 04/11] mm/mempolicy: modify get_mempolicy call stack to take a task argument

Michal Hocko <mhocko@xxxxxxxx> · Tue, 28 Nov 2023 15:07:28 +0100

On Wed 22-11-23 16:11:53, Gregory Price wrote:
[...]
> @@ -928,7 +929,16 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>  		 * vma/shared policy at addr is NULL.  We
>  		 * want to return MPOL_DEFAULT in this case.
>  		 */
> -		mm = current->mm;
> +		if (task == current) {
> +			/*
> +			 * original behavior allows a kernel task changing its
> +			 * own policy to avoid the condition in get_task_mm,
> +			 * so we'll directly access
> +			 */
> +			mm = task->mm;
> +			mmget(mm);

Do we actually have any kernel thread that would call this? Does it
actually make sense to support?

> +		} else
> +			mm = get_task_mm(task);
>  		mmap_read_lock(mm);
>  		vma = vma_lookup(mm, addr);
>  		if (!vma) {
> @@ -947,8 +957,10 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>  		return -EINVAL;
>  	else {
>  		/* take a reference of the task policy now */
> -		pol = current->mempolicy;
> +		task_lock(task);
> +		pol = task->mempolicy;
>  		mpol_get(pol);
> +		task_unlock(task);
>  	}
>  
>  	if (!pol) {
> @@ -962,12 +974,13 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>  			vma = NULL;
>  			mmap_read_unlock(mm);
>  			err = lookup_node(mm, addr);
> +			mmput(mm);
>  			if (err < 0)
>  				goto out;
>  			*policy = err;
> -		} else if (pol == current->mempolicy &&
> +		} else if (pol == task->mempolicy &&
>  				pol->mode == MPOL_INTERLEAVE) {
> -			*policy = next_node_in(current->il_prev, pol->nodes);
> +			*policy = next_node_in(task->il_prev, pol->nodes);

This is racy without task_lock which I do not think is helde but it also
seems this is not a big deal. pol is ref. counted so it won't go away
and if the task->mempolicy changes then the return value could be bogus
but this seems acceptable. It would be good to put a comment here that
this is actually deliberate.

-- 
Michal Hocko
SUSE Labs