Re: [xfstests generic/648] 64k directory block size (-n size=65536) crash on _xfs_buf_ioapply

Zorro Lang <zlang@xxxxxxxxxx> · Fri, 19 Jan 2024 09:38:07 +0800

On Thu, Jan 18, 2024 at 03:20:21PM +1100, Dave Chinner wrote:
> On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote:
> > Hi,
> > 
> > Recently I hit a crash [1] on s390x with 64k directory block size xfs
> > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1),
> > even not panic, a assertion failure will happen.
> > 
> > I found it from an old downstream kernel at first, then reproduced it
> > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long
> > time this issue be there, just reported it at first.
> >  [  978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
> >  [  979.216565] XFS (loop3): Starting recovery (logdev: internal)
> >  [  979.225078] XFS (loop3): Bad dir block magic!
> >  [  979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
> 
> Ok, so we got a XFS_BLFT_DIR_BLOCK_BUF buf log item, but the object
> that we recovered into the buffer did not have a
> XFS_DIR3_BLOCK_MAGIC type.
> 
> Perhaps the buf log item didn't contain the first 128 bytes of the
> buffer (or maybe any of it), and so didn't recovery the magic number?
> 
> Can you reproduce this with CONFIG_XFS_ASSERT_FATAL=y so the failure
> preserves the journal contents when the issue triggers, then get a
> metadump of the filesystem so I can dig into the contents of the
> journal?  I really want to see what is in the buf log item we fail
> to recover.
> 
> We don't want recovery to continue here because that will result in
> the journal being fully recovered and updated and so we won't be
> able to replay the recovery failure from it. 
> 
> i.e. if we leave the buffer we recovered in memory without failure
> because the ASSERT is just a warn, we continue onwards and likely
> then recover newer changes over the top of it. This may or may
> not result in a correctly recovered buffer, depending on what parts
> of the buffer got relogged.
> 
> IOWs, we should be expecting corruption to be detected somewhere
> further down the track once we've seen this warning, and really we
> should be aborting journal recovery if we see a mismatch like this.
> 
> .....
> 
> >  [  979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 
> >  [  979.227732] XFS (loop3): Unmount and run xfs_repair
> >  [  979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer:
> >  [  979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3........... 
> 
> XDB3 is XFS_DIR3_BLOCK_MAGIC, so it's the right type, but given it's
> the tail pointer (btp->count) that is bad, this indicates that maybe
> the tail didn't get written correctly by subsequent checkpoint
> recoveries. We don't know, because that isn't in the output below.
> 
> It likely doesn't matter, because I think the problem is either a
> runtime problem writing bad stuff into the journal, or a recovery
> problem failing to handle the contents correctly. Hence the need for
> a metadump.

Hi Dave,

Thanks for your reply. It's been a month passed, since I reported this
bug last time. Now I can't reproduce this issue on latest upstream
mainline linux and xfs-linux for-next branch. I've tried to do the
same testing ~1000 times, still can't reproduce it...

If you think it might not be fixed but be hided, I can try it on older
kernel which can reproduce this bug last time, to get a metadump. What
do you think?

Thanks,
Zorro

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>