On 5/23/23 4:32 PM, Justin Forbes wrote:
On Wed, May 03, 2023 at 09:13:18AM +1000, Dave Chinner wrote:
On Tue, May 02, 2023 at 05:13:09PM -0500, Mike Pastore wrote:
On Tue, May 2, 2023, 5:03 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
If you can find a minimal reproducer, that would help a lot in
diagnosing the issue.
This is great, thank you. I'll get to work.
One note: the problem occured with and without crc=0, so we can rule that
out at least.
Yes, I noticed that. My point was more that we have much more
confidence in crc=1 filesystems because they have much more robust
verification of the on-disk format and won't fail log recovery in
the way you noticed. The verification with crc=1 configured
filesystems is also known to catch issues caused by
memory corruption more frequently, often preventing such occurrences
from corrupting the on-disk filesystem.
Hence if you are seeing corruption events, you really want to be
using "-m crc=1" (default config) filesystems...
Upon trying to roll out 6.3.3 to Fedora users, it seems that we have a
few hitting this reliabily with 6.3 kernels. It is certainly not all
users of XFS though, as I use it extensively and haven't run across it.
The most responsive users who can reproduce all seem to be running on
xfs filesystems that were created a few years ago, and some even can't
reproduce it on their newer systems. Either way, it is a widespread
enough problem that I can't roll out 6.3 kernels to stable releases
until it is fixed.
https://bugzilla.redhat.com/show_bug.cgi?id=2208553
The two cases in that bug look very similar, and are on similar
hardware, and they also look (to me) like different problems than the
one reported here.
Those reporters are reading garbage data from disk, this one seems to be
in-memory corruption of an inode down a xfs_free_eofblocks() path...
I could be wrong, but I'm not seeing a connection between this report
and the bugzilla report, at first glance.
Thanks,
-Eric