Re: xfs/folio splat with v6.14-rc1

Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> · Mon, 10 Feb 2025 12:33:10 +0800

On 2025/2/10 12:16, Qu Wenruo wrote:

在 2025/2/10 14:32, Qi Zheng 写道:
Hi Zi,

On 2025/2/10 11:35, Zi Yan wrote:
On 7 Feb 2025, at 17:17, Matthew Wilcox wrote:

On Fri, Feb 07, 2025 at 04:29:36PM +0100, Christian Brauner wrote:
while true; do ./xfs.run.sh "generic/437"; done

allows me to reproduce this fairly quickly.

on holiday, back monday

git bisect points to commit
4817f70c25b6 ("x86: select ARCH_SUPPORTS_PT_RECLAIM if X86_64").
Qi is cc'd.

After deselect PT_RECLAIM on v6.14-rc1, the issue is gone.
At least, no splat after running for more than 300s,
whereas the splat is usually triggered after ~20s with
PT_RECLAIM set.

The PT_RECLAIM mainly made the following two changes:

1) try to reclaim page table pages during madvise(MADV_DONTNEED)
2) Unconditionally select MMU_GATHER_RCU_TABLE_FREE

Will ./xfs.run.sh "generic/437" perform the madvise(MADV_DONTNEED)?

Anyway, I will try to reproduce it locally and troubleshoot it.

BTW, btrfs is also able to reproduce the same problem on x86_64, all
default mount option.
Normally less than 32 runs of generic/437 (done by "./check -I 32
generic/437" of fstests) is enough to trigger it.
In my case, I go 128 runs to be extra sure.

And no more reproduce after deselect CONFIG_PT_RECLAIM option, thus it
really looks like 4817f70c25b6 ("x86: select ARCH_SUPPORTS_PT_RECLAIM if
X86_64") is the cause.

Thank you for your information, I will try to reproduce it locally and
troubleshoot it.

And for aarch64 64K page size and 4K fs block size, no reproduce at all.

Now, the PT_RECLAIM is only supported on x86_64.

Thanks,
Qi

Thanks,
Qu

Thanks!

--
Best Regards,
Yan, Zi