On Wed 22-11-23 16:11:53, Gregory Price wrote: [...] > @@ -928,7 +929,16 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, > * vma/shared policy at addr is NULL. We > * want to return MPOL_DEFAULT in this case. > */ > - mm = current->mm; > + if (task == current) { > + /* > + * original behavior allows a kernel task changing its > + * own policy to avoid the condition in get_task_mm, > + * so we'll directly access > + */ > + mm = task->mm; > + mmget(mm); Do we actually have any kernel thread that would call this? Does it actually make sense to support? > + } else > + mm = get_task_mm(task); > mmap_read_lock(mm); > vma = vma_lookup(mm, addr); > if (!vma) { > @@ -947,8 +957,10 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, > return -EINVAL; > else { > /* take a reference of the task policy now */ > - pol = current->mempolicy; > + task_lock(task); > + pol = task->mempolicy; > mpol_get(pol); > + task_unlock(task); > } > > if (!pol) { > @@ -962,12 +974,13 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, > vma = NULL; > mmap_read_unlock(mm); > err = lookup_node(mm, addr); > + mmput(mm); > if (err < 0) > goto out; > *policy = err; > - } else if (pol == current->mempolicy && > + } else if (pol == task->mempolicy && > pol->mode == MPOL_INTERLEAVE) { > - *policy = next_node_in(current->il_prev, pol->nodes); > + *policy = next_node_in(task->il_prev, pol->nodes); This is racy without task_lock which I do not think is helde but it also seems this is not a big deal. pol is ref. counted so it won't go away and if the task->mempolicy changes then the return value could be bogus but this seems acceptable. It would be good to put a comment here that this is actually deliberate. -- Michal Hocko SUSE Labs