Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 18 Jan 2024 15:20:21 +1100

On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> Hi,
> 
> Recently I hit a crash [1] on s390x with 64k directory block size xfs
> (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> even not panic, a assertion failure will happen.
> 
> I found it from an old downstream kernel at first, then reproduced it
> on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> time this issue be there, just reported it at first.
>  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
>  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
>  [  979.225078] XFS (loop3): Bad dir block magic!
>  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414

Ok, so we got a XFS_BLFT_DIR_BLOCK_BUF buf log item, but the object
that we recovered into the buffer did not have a
XFS_DIR3_BLOCK_MAGIC type.

Perhaps the buf log item didn't contain the first 128 bytes of the
buffer (or maybe any of it), and so didn't recovery the magic number?

Can you reproduce this with CONFIG_XFS_ASSERT_FATAL=y so the failure
preserves the journal contents when the issue triggers, then get a
metadump of the filesystem so I can dig into the contents of the
journal?  I really want to see what is in the buf log item we fail
to recover.

We don't want recovery to continue here because that will result in
the journal being fully recovered and updated and so we won't be
able to replay the recovery failure from it. 

i.e. if we leave the buffer we recovered in memory without failure
because the ASSERT is just a warn, we continue onwards and likely
then recover newer changes over the top of it. This may or may
not result in a correctly recovered buffer, depending on what parts
of the buffer got relogged.

IOWs, we should be expecting corruption to be detected somewhere
further down the track once we've seen this warning, and really we
should be aborting journal recovery if we see a mismatch like this.

.....

>  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
>  [  979.227732] XFS (loop3): Unmount and run xfs_repair
>  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
>  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 

XDB3 is XFS_DIR3_BLOCK_MAGIC, so it's the right type, but given it's
the tail pointer (btp->count) that is bad, this indicates that maybe
the tail didn't get written correctly by subsequent checkpoint
recoveries. We don't know, because that isn't in the output below.

It likely doesn't matter, because I think the problem is either a
runtime problem writing bad stuff into the journal, or a recovery
problem failing to handle the contents correctly. Hence the need for
a metadump.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx