Metadata CRC error detected at xfs_dir3_block_read_verify+0x9e/0xc0 [xfs], xfs_dir3_block block 0x86f58

Manfred Spraul <manfred@xxxxxxxxxxxxxxxx> · Sun, 13 Mar 2022 16:47:19 +0100

Hello together,

after a simulated power failure, I have observed:

>>>

Metadata CRC error detected at xfs_dir3_block_read_verify+0x9e/0xc0 
[xfs], xfs_dir3_block block 0x86f58
[14768.047531] XFS (loop0): Unmount and run xfs_repair
[14768.047534] XFS (loop0): First 128 bytes of corrupted metadata buffer:
[14768.047537] 00000000: 58 44 42 33 9f ab d7 f4 00 00 00 00 00 08 6f 
58  XDB3..........oX

<<<

Is this a known issue?

The image file is here: 
https://github.com/manfred-colorfu/nbd-datalog-referencefiles/blob/main/xfs-02/result/data-1821799.img.xz

As first question:

Are 512 byte sectors supported, or does xfs assume that 4096 byte writes 
are atomic?

How were the power failures simulated:

I added support to nbd to log all write operations, including the 
written data. This got merged into nbd-3.24

I've used that to create a log of running dbench (+ a few tar/rm/manual 
tests) on a 500 MB image file.

In total, 2.9 mio 512-byte sector writes. The datalog is ~1.5 GB long.

If replaying the initial 1,821,799, 1,821,800, 1,821,801 or 1,821,802 
blocks, the above listed error message is shown.

After 1,821,799 or 1,821,803 sectors, everything is ok.

(block numbers are 0-based)

> H=2400000047010000 C=0x00000001 (NBD_CMD_WRITE+NONE) 
O=0000000010deb000 L=00001000
block 1821795 (0x1bcc63): writing to offset 283029504 (0x10deb000), 
len 512 (0x200).
block 1821796 (0x1bcc64): writing to offset 283030016 (0x10deb200), 
len 512 (0x200).
block 1821797 (0x1bcc65): writing to offset 283030528 (0x10deb400), 
len 512 (0x200).  << OK
block 1821798 (0x1bcc66): writing to offset 283031040 (0x10deb600), 
len 512 (0x200).  FAIL
block 1821799 (0x1bcc67): writing to offset 283031552 (0x10deb800), 
len 512 (0x200).  FAIL
block 1821800 (0x1bcc68): writing to offset 283032064 (0x10deba00), 
len 512 (0x200).  FAIL
block 1821801 (0x1bcc69): writing to offset 283032576 (0x10debc00), 
len 512 (0x200).  FAIL
block 1821802 (0x1bcc6a): writing to offset 283033088 (0x10debe00), 
len 512 (0x200). << OK

The output from xfs_repair is below.

kernel: 5.16.12-200.fc35.x86_64

nbd:nbd-3.24-1.fc37.x86_64

mkfs options: mkfs.xfs /dev/nbd0 -m bigtime=1 -m finobt=1 -m rmapbt=1

mount options: mount -t xfs -o uqnoenforce /dev/nbd0 $tmpmnt

Generator script: 
https://github.com/manfred-colorfu/nbd-datalog-referencefiles/blob/main/xfs-02/generator/maketr

Further log file are also on github: 
https://github.com/manfred-colorfu/nbd-datalog-referencefiles/tree/main/xfs-02/result

<<<

/dev/loop0: [0037]:17060 (/tmp/data-341131.img)
Phase 1 - find and verify superblock...
        - block cache size set to 759616 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 734 tail block 734
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
Metadata CRC error detected at 0x563aa27804c3, xfs_dir3_block block 
0x86f58/0x1000
corrupt block 0 in directory inode 551205
        would junk block
no . entry for directory 551205
no .. entry for directory 551205
problem with directory contents in inode 551205
would have cleared inode 551205
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 3
        - agno = 2
        - agno = 0
corrupt block 0 in directory inode 551205
        would junk block
no . entry for directory 551205
no .. entry for directory 551205
problem with directory contents in inode 551205
would have cleared inode 551205
entry "COREL" in shortform directory 789069 references free inode 551205
would have junked entry "COREL" in directory inode 789069
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
entry "COREL" in shortform directory inode 789069 points to free inode 
551205
would junk entry
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 551174, would move to lost+found
disconnected inode 551176, would move to lost+found
disconnected inode 551178, would move to lost+found
disconnected inode 551180, would move to lost+found
disconnected inode 551206, would move to lost+found
disconnected inode 551207, would move to lost+found

disconnected inode 551208, would move to lost+found
disconnected inode 551209, would move to lost+found
disconnected inode 551210, would move to lost+found
disconnected inode 551211, would move to lost+found
disconnected inode 551212, would move to lost+found
disconnected inode 551213, would move to lost+found
disconnected inode 551214, would move to lost+found
disconnected inode 551215, would move to lost+found
disconnected inode 551217, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 789069 nlinks from 11 to 10
No modify flag set, skipping filesystem flush and exiting.

<<<<