On Fri, Jun 2, 2023 at 4:06 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > When hit unstable pmd, we should retry the pmd once more because it means > we probably raced with a thp insertion. > > Skipping it might be a problem as no error will be reported to the caller. > I assume it means the user will expect prot changed (e.g. mprotect or > userfaultfd wr-protections) applied but it's actually not. IIRC, mprotect() holds write mmap_lock, so it should not matter. PROT NUMA holds read mmap_lock, but returning 0 also doesn't matter (of course retry is fine too). just skip that 2M area. The userfaultfd-wp is your call :-) > > To achieve it, move the pmd_trans_unstable() call out of change_pte_range() > which will make the retry easier, as we can keep the retval of > change_pte_range() untouched. > > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> > --- > mm/mprotect.c | 20 +++++++++++--------- > 1 file changed, 11 insertions(+), 9 deletions(-) > > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 92d3d3ca390a..e4756899d40c 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -94,15 +94,6 @@ static long change_pte_range(struct mmu_gather *tlb, > > tlb_change_page_size(tlb, PAGE_SIZE); > > - /* > - * Can be called with only the mmap_lock for reading by > - * prot_numa so we must check the pmd isn't constantly > - * changing from under us from pmd_none to pmd_trans_huge > - * and/or the other way around. > - */ > - if (pmd_trans_unstable(pmd)) > - return 0; > - > /* > * The pmd points to a regular pte so the pmd can't change > * from under us even if the mmap_lock is only hold for > @@ -411,6 +402,7 @@ static inline long change_pmd_range(struct mmu_gather *tlb, > pages = ret; > break; > } > +again: > /* > * Automatic NUMA balancing walks the tables with mmap_lock > * held for read. It's possible a parallel update to occur > @@ -465,6 +457,16 @@ static inline long change_pmd_range(struct mmu_gather *tlb, > } > /* fall through, the trans huge pmd just split */ > } > + > + /* > + * Can be called with only the mmap_lock for reading by > + * prot_numa or userfaultfd-wp, so we must check the pmd > + * isn't constantly changing from under us from pmd_none to > + * pmd_trans_huge and/or the other way around. > + */ > + if (pmd_trans_unstable(pmd)) > + goto again; > + > pages += change_pte_range(tlb, vma, pmd, addr, next, > newprot, cp_flags); > next: > -- > 2.40.1 > >