Re: [PATCH 6.1.y 6.6.y 0/3] mm/filemap: fix page cache corruption with large folios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 22 Mar 2025 at 05:17, Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
>
> At this point, XFS large folios appear to be unreliable in the 6.1.y
> stable kernel.

I suspect it's a bad idea to start using large folios on stable
kernels. Even with the page cache corruption fix, 6.1 is old enough
that I don't know what other fixes have happened since.

It's not like the large folio code has been _hugely_ problematic, but
there has definitely been various small fixes related to it, and maybe
some of them have missed stable.

So I think stable should revert the "turn on large folios" in general.

That said:

> We would appreciate any suggestions, such as adding debug messages to
> the kernel source code, to help us diagnose the root cause.

I think the first thing to do - if you can - is to make sure that a
much more *current* kernel actually is ok.

Without a consistent reproducer it's going to be hard to really bisect
things, but the first step should be to make sure it's not some new
kind of issue that happens to be unique to what you do.

By "current" I don't necessarily mean "very latest" - 6.14 is going to
be released this weekend - but certainly something much more recent
than 6.1-stable.

Because while the stable trees obviously collect modern fixes, subtler
issues can easily fall through if people don't realize how important a
particular fix was. Sometimes the "obvious cleanup patches" end up
fixing things unintentionally just by making the code more
straightforward and correcting something in the process.

Without any real clues outside of "corruption", it's hard to even
guess whether it's core MM or VFS code, or some XFS-specific thing.
There has been large folio work in all three areas.

So I suspect unless somebody has something in mind, "bisect it" to at
least partially narrowing it down would be the only thing to do.
Bisecting to one particular commit obviously is the best scenario, but
even narrowing it down to "the issue still happens in 6.12, but is
gone in 6.13" kind of narrowing down might help give people more of a
place to start looking.

             Linus




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux