Hi all,
On 2025/2/10 12:02, Qi Zheng wrote:
Hi Zi,
On 2025/2/10 11:35, Zi Yan wrote:
On 7 Feb 2025, at 17:17, Matthew Wilcox wrote:
On Fri, Feb 07, 2025 at 04:29:36PM +0100, Christian Brauner wrote:
while true; do ./xfs.run.sh "generic/437"; done
allows me to reproduce this fairly quickly.
on holiday, back monday
git bisect points to commit
4817f70c25b6 ("x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64").
Qi is cc'd.
After deselect PT_RECLAIM on v6.14-rc1, the issue is gone.
At least, no splat after running for more than 300s,
whereas the splat is usually triggered after ~20s with
PT_RECLAIM set.
The PT_RECLAIM mainly made the following two changes:
1) try to reclaim page table pages during madvise(MADV_DONTNEED)
2) Unconditionally select MMU_GATHER_RCU_TABLE_FREE
Will ./xfs.run.sh "generic/437" perform the madvise(MADV_DONTNEED)?
Anyway, I will try to reproduce it locally and troubleshoot it.
I reproduced it locally and it was indeed caused by PT_RECLAIM.
The root cause is that the pte lock may be released midway in
zap_pte_range() and then retried. In this case, the originally none pte
entry may be refilled with physical pages.
So we should recheck all pte entries in this case:
diff --git a/mm/memory.c b/mm/memory.c
index a8196ae72e9ae..ca1b133a288b5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1721,7 +1721,7 @@ static unsigned long zap_pte_range(struct
mmu_gather *tlb,
pmd_t pmdval;
unsigned long start = addr;
bool can_reclaim_pt = reclaim_pt_is_enabled(start, end, details);
- bool direct_reclaim = false;
+ bool direct_reclaim = true;
int nr;
retry:
@@ -1736,8 +1736,10 @@ static unsigned long zap_pte_range(struct
mmu_gather *tlb,
do {
bool any_skipped = false;
- if (need_resched())
+ if (need_resched()) {
+ direct_reclaim = false;
break;
+ }
nr = do_zap_pte_range(tlb, vma, pte, addr, end,
details, rss,
&force_flush, &force_break,
&any_skipped);
@@ -1745,11 +1747,12 @@ static unsigned long zap_pte_range(struct
mmu_gather *tlb,
can_reclaim_pt = false;
if (unlikely(force_break)) {
addr += nr * PAGE_SIZE;
+ direct_reclaim = false;
break;
}
} while (pte += nr, addr += PAGE_SIZE * nr, addr != end);
- if (can_reclaim_pt && addr == end)
+ if (can_reclaim_pt && direct_reclaim && addr == end)
direct_reclaim = try_get_and_clear_pmd(mm, pmd, &pmdval);
add_mm_rss_vec(mm, rss);
I tested the above code and no bugs were reported for a while. Does it
work for you?
Thanks,
Qi
Thanks!
--
Best Regards,
Yan, Zi