On 10 Feb 2025, at 3:18, Qi Zheng wrote: > Hi all, > > On 2025/2/10 12:02, Qi Zheng wrote: >> Hi Zi, >> >> On 2025/2/10 11:35, Zi Yan wrote: >>> On 7 Feb 2025, at 17:17, Matthew Wilcox wrote: >>> >>>> On Fri, Feb 07, 2025 at 04:29:36PM +0100, Christian Brauner wrote: >>>>> while true; do ./xfs.run.sh "generic/437"; done >>>>> >>>>> allows me to reproduce this fairly quickly. >>>> >>>> on holiday, back monday >>> >>> git bisect points to commit >>> 4817f70c25b6 ("x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64"). >>> Qi is cc'd. >>> >>> After deselect PT_RECLAIM on v6.14-rc1, the issue is gone. >>> At least, no splat after running for more than 300s, >>> whereas the splat is usually triggered after ~20s with >>> PT_RECLAIM set. >> >> The PT_RECLAIM mainly made the following two changes: >> >> 1) try to reclaim page table pages during madvise(MADV_DONTNEED) >> 2) Unconditionally select MMU_GATHER_RCU_TABLE_FREE >> >> Will ./xfs.run.sh "generic/437" perform the madvise(MADV_DONTNEED)? >> >> Anyway, I will try to reproduce it locally and troubleshoot it. > > I reproduced it locally and it was indeed caused by PT_RECLAIM. > > The root cause is that the pte lock may be released midway in > zap_pte_range() and then retried. In this case, the originally none pte > entry may be refilled with physical pages. > > So we should recheck all pte entries in this case: > > diff --git a/mm/memory.c b/mm/memory.c > index a8196ae72e9ae..ca1b133a288b5 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1721,7 +1721,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > pmd_t pmdval; > unsigned long start = addr; > bool can_reclaim_pt = reclaim_pt_is_enabled(start, end, details); > - bool direct_reclaim = false; > + bool direct_reclaim = true; > int nr; > > retry: > @@ -1736,8 +1736,10 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > do { > bool any_skipped = false; > > - if (need_resched()) > + if (need_resched()) { > + direct_reclaim = false; > break; > + } > > nr = do_zap_pte_range(tlb, vma, pte, addr, end, details, rss, > &force_flush, &force_break, &any_skipped); > @@ -1745,11 +1747,12 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > can_reclaim_pt = false; > if (unlikely(force_break)) { > addr += nr * PAGE_SIZE; > + direct_reclaim = false; > break; > } > } while (pte += nr, addr += PAGE_SIZE * nr, addr != end); > > - if (can_reclaim_pt && addr == end) > + if (can_reclaim_pt && direct_reclaim && addr == end) > direct_reclaim = try_get_and_clear_pmd(mm, pmd, &pmdval); > > add_mm_rss_vec(mm, rss); > > I tested the above code and no bugs were reported for a while. Does it > work for you? It also fixed the issue I see on xfs as well. Tested-by: Zi Yan <ziy@xxxxxxxxxx> -- Best Regards, Yan, Zi