On Mon, Dec 18, 2023 at 10:01:34PM +0800, Zorro Lang wrote: > Hi, > > Recently I hit a crash [1] on s390x with 64k directory block size xfs > (-n size=65536 -m crc=1,finobt=1,reflink=1,rmapbt=0,bigtime=1,inobtcount=1), > even not panic, a assertion failure will happen. > > I found it from an old downstream kernel at first, then reproduced it > on latest upstream mainline linux (v6.7-rc6). Can't be sure how long > time this issue be there, just reported it at first. > [ 978.591588] XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed > [ 979.216565] XFS (loop3): Starting recovery (logdev: internal) > [ 979.225078] XFS (loop3): Bad dir block magic! > [ 979.225081] XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414 Ok, so we got a XFS_BLFT_DIR_BLOCK_BUF buf log item, but the object that we recovered into the buffer did not have a XFS_DIR3_BLOCK_MAGIC type. Perhaps the buf log item didn't contain the first 128 bytes of the buffer (or maybe any of it), and so didn't recovery the magic number? Can you reproduce this with CONFIG_XFS_ASSERT_FATAL=y so the failure preserves the journal contents when the issue triggers, then get a metadump of the filesystem so I can dig into the contents of the journal? I really want to see what is in the buf log item we fail to recover. We don't want recovery to continue here because that will result in the journal being fully recovered and updated and so we won't be able to replay the recovery failure from it. i.e. if we leave the buffer we recovered in memory without failure because the ASSERT is just a warn, we continue onwards and likely then recover newer changes over the top of it. This may or may not result in a correctly recovered buffer, depending on what parts of the buffer got relogged. IOWs, we should be expecting corruption to be detected somewhere further down the track once we've seen this warning, and really we should be aborting journal recovery if we see a mismatch like this. ..... > [ 979.227613] XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 > [ 979.227732] XFS (loop3): Unmount and run xfs_repair > [ 979.227733] XFS (loop3): First 128 bytes of corrupted metadata buffer: > [ 979.227736] 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20 XDB3........... XDB3 is XFS_DIR3_BLOCK_MAGIC, so it's the right type, but given it's the tail pointer (btp->count) that is bad, this indicates that maybe the tail didn't get written correctly by subsequent checkpoint recoveries. We don't know, because that isn't in the output below. It likely doesn't matter, because I think the problem is either a runtime problem writing bad stuff into the journal, or a recovery problem failing to handle the contents correctly. Hence the need for a metadump. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx