Recently Zorro tripped over a failure with 64kB directory blocks on s390x via generic/648. Recovery was reporting failures like this: XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed XFS (loop3): Starting recovery (logdev: internal) XFS (loop3): Bad dir block magic! XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414 .... Or it was succeeding and later operations were detecting directory block corruption during idrectory operations such as: XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020 XFS (loop3): Unmount and run xfs_repair XFS (loop3): First 128 bytes of corrupted metadata buffer: 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20 XDB3........... .... Futher triage and diagnosis pointed to the fact that the test was generating a discontiguous (multi-extent) directory block and that directory block was not being recovered correctly when it was encountered. Zorro captured a trace, and what we saw in the trace was a specific pattern of buffer log items being processed through every phase of recovery: xfs_log_recover_buf_not_cancel: dev 7:0 daddr 0x2c2ce0, bbcount 0x10, flags 0x5000, size 2, map_size 2 xfs_log_recover_item_recover: dev 7:0 tid 0xce3ce480 lsn 0x300014178, pass 1, item 0x8ea70fc0, item type XFS_LI_BUF item region count/total 2/2 xfs_log_recover_buf_not_cancel: dev 7:0 daddr 0x331fb0, bbcount 0x58, flags 0x5000, size 2, map_size 11 xfs_log_recover_item_recover: dev 7:0 tid 0xce3ce480 lsn 0x300014178, pass 1, item 0x8f36c040, item type XFS_LI_BUF item region count/total 2/2 The item addresses, tid and LSN change, but the order of the two buf log items does not. These are both "flags 0x5000" which means both log items are XFS_BLFT_DIR_BLOCK_BUF types, and they are both partial directory block buffers, and they are discontiguous. They also have different types of log items both before and after them, so it is likely these are two extents within the same compound buffer. The way we log compound buffers is that we create a buf log format item for each extent in the buffer, and then we log each range as a separate buf log format item. IOWs, to recovery these fragments of the directory block appear just like complete regular buffers that need to be recovered. Hence what we see above is the first buffer (daddr 0x2c2ce0, bbcount 0x10) is the first extent in the directory block that contains the header and magic number, so it recovers and verifies just fine. The second buffer is the tail of the directory block, and it does not contain a magic number because it starts mid-way through the directory block. Hence the magic number verification fails and the buffer is not recovered. Compound buffers were logged this way so that they didn't require a change of log format to recover. Prior to compound buffers, the directory code kept it's own dabuf structure to map multiple extents in a single directory block, and they got logged as separate buffer log format items as well. So the problem isn't directly related to the conversion of dabufs to compound buffers - the problem is related to the buffer recovery verification code not knowing that directory buffer fragments are valid recovery targets. Hence the fixes in this patchset are to log recovery, and do not change runtime behaviour at all. The first thing we do is change the buffer recovery code to consider a type mismatch between the BLF and the buffer contents as a fatal error instead of a warning. If we just warn and continue, the recovered metadata may still be corrupt and so we should just abort with -EFSCORRUPTED when this occurs. That addresses the silent recovery success followed by runtime detection of directory corruption issue that was encountered. We then need to untangle the buffer recovery code a bit. Inode buffer, dquot buffer and regular buffer recovery are all a bit different, but they are tightly intertwined. neither dquot nor inode buffer recovery need discontiguous buffer recovery detection, and they also have different constraints so separate them out. We also always recover inode and dquot buffers, so we don't need check magic numbers or decode internal lsns to determine if they should be recovered. With that done, we can then add code to the general buffer recovery to detect partial block recovery situations. We check the BLF type to determine if it is a directory buffer, and add a path for recovery of partial directory block items. This allows recovery of regions of directory blocks that do not start at offset 0 in the directory block. This fixes the initial "bad dir block magic" issue reported, and results in correct recovery of discontiguous directory blocks. IOWs, this appears to be a log recovery problem and not a runtime issue. I think the fix will be to allow directory blocks to fail the magic number check if and only if the buffer length does not match the directory block size (i.e. it is a partial directory block fragment being recovered). This passes repeated looping over '-g enospc -g recoveryloop' on 64kb directory block size configurations, so the change to recovery hasn't caused any obvious regressions in fixing this issue. Thoughts?