Michal Hocko <mhocko@xxxxxxxx> writes: > On Mon 29-11-21 16:16:05, Aneesh Kumar K.V wrote: >> Michal Hocko <mhocko@xxxxxxxx> writes: >> >> > On Tue 16-11-21 12:12:37, Aneesh Kumar K.V wrote: > [...] >> >> +SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, len, >> >> + unsigned long, home_node, unsigned long, flags) >> >> +{ >> >> + struct mm_struct *mm = current->mm; >> >> + struct vm_area_struct *vma; >> >> + struct mempolicy *new; >> >> + unsigned long vmstart; >> >> + unsigned long vmend; >> >> + unsigned long end; >> >> + int err = -ENOENT; >> >> + >> >> + if (start & ~PAGE_MASK) >> >> + return -EINVAL; >> >> + /* >> >> + * flags is used for future extension if any. >> >> + */ >> >> + if (flags != 0) >> >> + return -EINVAL; >> >> + >> >> + if (!node_online(home_node)) >> >> + return -EINVAL; >> > >> > You really want to check the home_node before dereferencing the mask. >> > >> >> Any reason why we want to check for home node first? > > Because the given node is an index to node_states[N_ONLINE] bitmap. I do > not think we do range checking there. Will add this if (home_node >= MAX_NUMNODES || !node_online(home_node)) return -EINVAL; > >> >> + len = (len + PAGE_SIZE - 1) & PAGE_MASK; >> >> + end = start + len; >> >> + >> >> + if (end < start) >> >> + return -EINVAL; >> >> + if (end == start) >> >> + return 0; >> >> + mmap_write_lock(mm); >> >> + vma = find_vma(mm, start); >> >> + for (; vma && vma->vm_start < end; vma = vma->vm_next) { >> >> + >> >> + vmstart = max(start, vma->vm_start); >> >> + vmend = min(end, vma->vm_end); >> >> + new = mpol_dup(vma_policy(vma)); >> >> + if (IS_ERR(new)) { >> >> + err = PTR_ERR(new); >> >> + break; >> >> + } >> >> + /* >> >> + * Only update home node if there is an existing vma policy >> >> + */ >> >> + if (!new) >> >> + continue; >> > >> > Your changelog only mentions MPOL_BIND and MPOL_PREFERRED_MANY as >> > supported but you seem to be applying the home node to all existing >> > policieso >> >> >> The restriction is done in policy_node. >> >> @@ -1801,6 +1856,11 @@ static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd) >> WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE)); >> } >> >> + if ((policy->mode == MPOL_BIND || >> + policy->mode == MPOL_PREFERRED_MANY) && >> + policy->home_node != NUMA_NO_NODE) >> + return policy->home_node; >> + >> return nd; >> } > > But you do allow to set the home node also for other policies and that > means that a default MPOL_INTERLEAVE would be different from the one > with home_node set up even though they behave exactly the same. I agree that there is no error returned if we try to set the home_node for other memory policies. But there should not be any behaviour differences. We ignore home_node for policies other than BIND and PREFERRED_MANY. The reason I avoided erroring out for other policies was to simplify the error handling. For example, for a range of addr with a mix of memory policy MPOLD_BIND and MPOL_INTERLEAVE what should be the state after the above system call? We could say, we ignore setting home_node for vma with policy MPOL_INTERLEAVE and leave the home node set for vma with policy MPOL_BIND. Or should we make the system call return error also undoing the changes done for vmas for which we have set the home_node? -aneesh