[also on linux-ext4] I am encountering an unusual problem after an mdraid failure, I'll summarise briefly and can provide further details as required. First of all, the context. This is happening on a Debian 11 system, amd64 arch, with current updates (kernel 5.10.136-1, util-linux 2.36.1). The system has a 12 drive mdraid RAID5 for data, recently migrated to LSI 2308 HBAs. This is relevant because earlier this week, at around 13,00 local (EST), four drives, an entire HBA channel, decided to drop from the RAID. Of course, mdraid didn't like that and stopped the arrays. I reverted to best practice and shut down the system first of all. Further context: the filesystem in the array is ancient - I am vaguely proud of that - from 2001. It started as ext2, grew to ext3, then to ext4 and finally to ext4 with 64 bits. Because I am paranoid, I always mount ext4 with nodelalloc and data=journal. The journal is external on a RAID1 of SSDs. I recently (within the last ~3 months) enabled metadata_csum, which is relevant to the following - the filesystem had never had metadata_csum enabled before. Upon reboot, the arrays would not reassemble - this is expected, because 4/12 drives were marked faulty. So I re--created the array using the same parameters as were used back when the array was built. Unfortunately, I had a moment of stupid and didn't specify metadata 0.90 in the re--create, so it was recreated with metadata 1.2... which writes its data block at the beginning of the components, not at the end. I noticed it, restopped the array and recreated with the correct 0.90, but the damage was done: the 256 byte + 12 * 20 header was written at the beginning of each of the 12 components. Still, unless I am mistaken, this just means that at worst 12x (second block of each component) were damaged, which shouldn't be too bad. The only further possibility is that mdraid also zeroed out the 'blank space' that it puts AFTER the header block and BEFORE the data, but according to documentation it shouldn't do that. In any case, I subsequently reassembled the array 'correctly' to match the previous order and settings and I believe I got it right. I kept the array RO and tried fsck -n, which gave me this: ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap fsck.ext4: Group descriptors look bad... trying backup blocks... It then warns that it won't attempt journal recovery because it's in RO mode and declares the fs clean - with a reasonable looking number of files and blocks. If I try to mount -t ext4 -o ro, I get : mount: /mnt: mount(2) system call failed: Structure needs cleaning. so before anything else, I tried fsck -nf to make sure that the REST of the filesystem is in one logical piece. THAT painted a very different picture: On pass 1, I get approximately 980k (almost 10^6) of Inode nnnnn passes checks, but checksum does not match inode and ~ 2000 Inode nnnnn contains garbage Plus some 'tree not optimised' which are technically not errors, from what I understand. After ~11 hours, it switches to 1b, tells me that inode 12 has a long list of duplicate blocks Running additional passes to resolve blocks claimed by more than one inode... Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed block(s) in inode 12: 2928004133 [....] And ends after the list of multiply claimed blocks with: e2fsck: aborted Error while scanning inodes (8193): Inode checksum does not match inode /dev/md123: ********** WARNING: Filesystem still has errors ********** /dev/md123: ********** WARNING: Filesystem still has errors ********** So, what is my next step? I realise I should NOT have touched the original drives and dd-ed images to a separate array to work on those, but I believe the only writing that occurred were the mdraid superblocks. I am, in any case, grabbing more drives to image the 'faulty' array and work on the images, leaving the original data alone. Where do I go from here? I have had similar issues in the past, all the way back to the early 00s, and I had a near-100% success rate by re--creating the arrays. What is different this time? Or, is nothing different and is the problem just in the checksumming? Thanks!