[Bug 203943] New: ext4 corruption after RAID6 degraded; e2fsck skips block checks and fails

bugzilla-daemon@xxxxxxxxxxxxxxxxxxx · Fri, 21 Jun 2019 06:51:27 +0000

https://bugzilla.kernel.org/show_bug.cgi?id=203943

            Bug ID: 203943
           Summary: ext4 corruption after RAID6 degraded; e2fsck skips
                    block checks and fails
           Product: File System
           Version: 2.5
    Kernel Version: 4.19.52-gentoo
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@xxxxxxxxxxxxxxxxxxxx
          Reporter: yann@xxxxxxxxxxx
        Regression: No

I use a 24TB SW-RAID6 with 10 3TB HDDs. This array contains a dmcrypt
container, which contains a ext4 FS [1].
One of the disks had errors and got kicked out of the array. Before I was able
to replace it, the ext4 FS began to throw errors like these:

EXT4-fs error (device dm-1): ext4_find_dest_de:1802: inode #3833864: block
61343924: comm nfsd: bad entry in directory: rec_len % 4 != 0 - offset=1000,
inode=2620025549, rec_len=30675, name_len=223, size=4096
EXT4-fs error (device dm-1): ext4_lookup:1577: inode #172824586: comm
tvh:tasklet: iget: bad extra_isize 13022 (inode size 256)
EXT4-fs error (device dm-1): htree_dirblock_to_tree:1010: inode #7372807: block
117967811: comm tar: bad entry in directory: rec_len % 4 != 0 - offset=104440,
inode=1855122647, rec_len=12017, name_len=209, size=4096 

I then used e2fsck to check the FS for errors, but it only created dozens of
the following output lines:
German original: "Block %$b von Inode %$i steht in Konflikt mit kritischen
Metadaten, Blockprüfungen werden übersprungen."
Translated: "Inode %$i block %$b conflicts with critical metadata, skipping
block checks." 
It also filled up my RAM, so I added ~100G of swap space. After scanning for a
few days, the e2fsck process died.

Afterwards I was able to replace the disk, re-scan the FS with e2fsck and after
2 hours, the ext4 FS was clean again.

To understand the problem, I set one of the disks faulty, and suddenly the ext4
errors occurred again. I added a spare disk, resynced the array, but e2fsck is
unable to fix the errors, and only throws "Inode %$i block %$b conflicts with
critical metadata, skipping block checks." while using more and more RAM.

Used software versions:
Kernel 4.19.52-gentoo
e2fsprogs-1.44.5
mdadm-4.1

Please let me know, if I should provide additional information.

Kind regards,
Yann

########################################## additional information

#e2fsck -c /dev/mapper/share 
e2fsck 1.44.5 (15-Dec-2018)
badblocks: ungültige letzter Block - 5860269055
/dev/mapper/share: Updating bad block inode.
Durchgang 1: Inodes, Blöcke und Größen werden geprüft
Block %$b von Inode %$i steht in Konflikt mit kritischen Metadaten,
Blockprüfungen werden übersprungen.
[...]
Block %$b von Inode %$i steht in Konflikt mit kritischen Metadaten,
Blockprüfungen werden übersprungen.
Signal (6) SIGABRT si_code=SI_TKILL 
e2fsck(+0x33469)[0x55ec4942a469]
/lib64/libc.so.6(+0x39770)[0x7f562449f770]
/lib64/libc.so.6(gsignal+0x10b)[0x7f562449f6ab]
/lib64/libc.so.6(abort+0x123)[0x7f5624488539]
/lib64/libext2fs.so.2(+0x18af5)[0x7f5624ac3af5]
e2fsck(+0x18c30)[0x55ec4940fc30]
/lib64/libext2fs.so.2(+0x1a1ec)[0x7f5624ac51ec]
/lib64/libext2fs.so.2(+0x1a589)[0x7f5624ac5589]
/lib64/libext2fs.so.2(+0x1b04b)[0x7f5624ac604b]
e2fsck(+0x1978c)[0x55ec4941078c]
e2fsck(+0x1b0f6)[0x55ec494120f6]
e2fsck(+0x1b1dc)[0x55ec494121dc]
/lib64/libext2fs.so.2(ext2fs_get_next_inode_full+0x8e)[0x7f5624adc53e]
e2fsck(e2fsck_pass1+0xa12)[0x55ec49412d22]
e2fsck(e2fsck_run+0x6a)[0x55ec4940b54a]
e2fsck(main+0xefd)[0x55ec4940703d]
/lib64/libc.so.6(__libc_start_main+0xeb)[0x7f5624489e6b]
e2fsck(_start+0x2a)[0x55ec494092ea]

---------------------------------------------------------------------

#tune2fs -l /dev/mapper/share
tune2fs 1.44.5 (15-Dec-2018)
Filesystem volume name:   <none>
Last mounted on:          /home/share
Filesystem UUID:          c5f0559d-e3bd-473f-abc0-7c42b3115897
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr dir_index filetype extent 64bit flex_bg
sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              366268416
Block count:              5860269056
Reserved block count:     0
Free blocks:              755383351
Free inodes:              363816793
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         2048
Inode blocks per group:   128
RAID stride:              128
RAID stripe width:        1024
Flex block group size:    16
Filesystem created:       Sat Mar 17 14:36:16 2018
Last mount time:          Fri Jun 21 05:25:34 2019
Last write time:          Fri Jun 21 05:30:27 2019
Mount count:              3
Maximum mount count:      -1
Last checked:             Thu Jun 20 08:55:17 2019
Check interval:           0 (<none>)
Lifetime writes:          139 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Default directory hash:   half_md4
Directory Hash Seed:      4c37872e-3207-4ff4-8939-a428feaeb49f
Journal backup:           inode blocks
FS Error count:           20776
First error time:         Thu Jun 20 14:18:47 2019
First error function:     ext4_lookup
First error line #:       1577
First error inode #:      172824586
First error block #:      0
Last error time:          Fri Jun 21 05:53:24 2019
Last error function:      ext4_lookup
Last error line #:        1577
Last error inode #:       172824586
Last error block #:       0

---------------------------------------------------------------------

#cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath]
[faulty] 
md2 : active raid6 sde[0] sdc[9] sdb[8] sdj[7] sdi[6] sda[11] sdd[10] sdh[3]
sdg[2] sdf[1]
      23441080320 blocks super 1.2 level 6, 512k chunk, algorithm 2 [10/10]
[UUUUUUUUUU]
      bitmap: 0/22 pages [0KB], 65536KB chunk

md1 : active raid1 sdk2[0] sdl2[2]
      52396032 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sdk1[0] sdl1[2]
      511680 blocks super 1.2 [2/2] [UU]

unused devices: <none>

-- 
You are receiving this mail because:
You are watching the assignee of the bug.