When hit unstable pmd, we should retry the pmd once more because it means we probably raced with a thp insertion. Skipping it might be a problem as no error will be reported to the caller. I assume it means the user will expect prot changed (e.g. mprotect or userfaultfd wr-protections) applied but it's actually not. To achieve it, move the pmd_trans_unstable() call out of change_pte_range() which will make the retry easier, as we can keep the retval of change_pte_range() untouched. Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> --- mm/mprotect.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 92d3d3ca390a..e4756899d40c 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -94,15 +94,6 @@ static long change_pte_range(struct mmu_gather *tlb, tlb_change_page_size(tlb, PAGE_SIZE); - /* - * Can be called with only the mmap_lock for reading by - * prot_numa so we must check the pmd isn't constantly - * changing from under us from pmd_none to pmd_trans_huge - * and/or the other way around. - */ - if (pmd_trans_unstable(pmd)) - return 0; - /* * The pmd points to a regular pte so the pmd can't change * from under us even if the mmap_lock is only hold for @@ -411,6 +402,7 @@ static inline long change_pmd_range(struct mmu_gather *tlb, pages = ret; break; } +again: /* * Automatic NUMA balancing walks the tables with mmap_lock * held for read. It's possible a parallel update to occur @@ -465,6 +457,16 @@ static inline long change_pmd_range(struct mmu_gather *tlb, } /* fall through, the trans huge pmd just split */ } + + /* + * Can be called with only the mmap_lock for reading by + * prot_numa or userfaultfd-wp, so we must check the pmd + * isn't constantly changing from under us from pmd_none to + * pmd_trans_huge and/or the other way around. + */ + if (pmd_trans_unstable(pmd)) + goto again; + pages += change_pte_range(tlb, vma, pmd, addr, next, newprot, cp_flags); next: -- 2.40.1