On 3/8/22 22:32, Hugh Dickins wrote: > On Tue, 8 Mar 2022, Liam Howlett wrote: >> * Hugh Dickins <hughd@xxxxxxxxxx> [220304 21:29]: >> > On Sat, 5 Mar 2022, Liam Howlett wrote: >> > > * Hugh Dickins <hughd@xxxxxxxxxx> [220304 17:48]: >> > > > On Fri, 4 Mar 2022, Liam Howlett wrote: >> > > > > * Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [220304 13:49]: >> > > > > > * Hugh Dickins <hughd@xxxxxxxxxx> [220303 23:36]: >> > > > > >> > > > > I just thought of something after my initial email >> > > > > >> > > > > How does the ->set_policy() requirement on tmpfs play out for the >> > > > > mpol_equal() check earlier in that for loop? >> > > > >> > > > It took me a while to page all this back in (and remind myself of >> > > > what is case 8) to answer that question! >> > > > >> > > > The answer is that the mpol_equal() check at the top of the loop is on >> > > > an existing, unmodified vma; so it's right to assume that any necessary >> > > > set_policy() has already been done. >> > > > >> > > > Whereas the mpol_equal() check being removed in this patch, is being >> > > > done on a vma which may have just been extended to cover a greater range: >> > > > so although the relevant set_policy() may have already been done on a part >> > > > of its range, there is now another part which needs the policy applied. >> > > >> > > Doesn't the policy get checked during vma_merge()? Specifically the >> > > mpol_equal(policy, vma_policy(next)) check? >> > >> > Sorry, I'm reduced to the unhelpful reply of "Yes. So?" >> > >> > If vma_merge() finds that vma's new_pol allows it to be merged with prev, >> > that still requires mbind_range() (or its call to vma_replace_policy()) >> > to set_policy() on prev (now assigned to vma), to apply that new_pol to >> > the extension of prev - vma_merge() would have checked mpol_equal(), >> > but would not have done the set_policy(). >> >> I must be missing something. If mpol_equal() isn't sufficient to ensure >> we don't need to set_policy(), then why are the other vma_merge() cases >> okay - such as madvise_update_vma() and mlock_fixup()? Won't the mem >> policy change in the same way in these cases? > > mlock provides a good example to compare. > > Mlocking pages is the business of mlock(), and mlock_fixup() needs to > attend to mm->locked_vm, and calling something to mark as PageMlocked > those pages already in the area now covered by mlock. But it doesn't > need to worry about set_policy(), that's not its business, and is > unaffected by mlock changes (though merging of vmas needs mpol_equal() > to check that policy is the same, and merging and splitting of vmas > need to maintain the refcount of the shared policy if any). > > Whereas NUMA mempolicy is the business of mbind(), and mbind_range() > needs to attend to vma->vm_policy, and if it's a mapping of something > supporting a shared set_policy(), call that to establish the new range > on the object mapped. But it doesn't need to worry about mm->locked_vm > or whether pages are Mlocked, that's not its business, and is unaffected > by mbind changes (though merging of vmas needs to check VM_LOCKED among > other flags to check that they are the same before it can merge). So if I understand correctly, we have case 8 of vma_merge(): AAAA PPPPNNNNXXXX becomes PPPPXXXXXXXX 8 N is vma with some old policy different from new_pol A is the range where we change to new policy new_pol, which happens to be the same as existing policy of X Thus vma_merge() extends vma X to include range A - the vma N vma_merge() succeeds because it's passed new_pol to do the compatibility checks (although N still has the previous policy) Before Hugh's patch we would then realize "oh X already has new_pol, nothing to do". Note that this AFAICS doesn't affect actual pages migration between nodes, because that happens outside of mbind_range(). But it causes us to skip vma_replace_policy(), which causes us to skip vm_ops->set_policy, where tmpfs does something important (we could maybe argue that Hugh didn't specify the user visible effects of this exactly enough :) what is "leaving the new mbind unenforced" - are pages not migrated in this case?). HTH (if I'm right), Vlastimil > Does that help? > > Hugh >