Hi, I've got an 85TB ext4 filesystem which I'm unable to fsck. The only cases of same error I could find was from what I can find due to an SD card "swallowing" writes (ie, the card goes into a read-only mode but doesn't report write failure). crowsnest ~ # e2fsck -f /dev/lvm/home e2fsck 1.45.4 (23-Sep-2019) ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap e2fsck: Group descriptors look bad... trying backup blocks... /dev/lvm/home: recovering journal e2fsck: unable to set superblock flags on /dev/lvm/home /dev/lvm/home: ***** FILE SYSTEM WAS MODIFIED ***** /dev/lvm/home: ********** WARNING: Filesystem still has errors ********** I have also (using dumpe2fs) obtained the location of the backup super blocks and tried same against a few other superblocks using -b. -y (as per suggestion from at least one post) make absolutely no difference, our understanding is that this simply answers yes to all questions, so we didn't expect this to have impact but decided it was worth a try anyway. Looking at the code for the unable to set superblock error it looks like the code is in e2fsck/unix.c, specifically this: 1765 if (ext2fs_has_feature_journal_needs_recovery(sb)) { 1766 if (ctx->options & E2F_OPT_READONLY) { ... 1771 } else { 1772 if (ctx->flags & E2F_FLAG_RESTARTED) { 1773 /* 1774 * Whoops, we attempted to run the 1775 * journal twice. This should never 1776 * happen, unless the hardware or 1777 * device driver is being bogus. 1778 */ 1779 com_err(ctx->program_name, 0, 1780 _("unable to set superblock flags " 1781 "on %s\n"), ctx->device_name); 1782 fatal_error(ctx, 0); 1783 } That comment has me somewhat confused. I'm assuming the implication there is that e2fsck tried to update the superblock, but after reading it back, it's either unchanged or still wrong (In line with the description of the SD card I found online). None of our arrays are reflecting R/O in /proc/mdstat. We did pick out this in kernel bootup (we downgraded back to 5.1.15, which we're on currently, after experiencing major performance issues on 5.3.6 and subsequently 5.4.8 didn't seem to fix those, and the 4.14.13 kernel that was used previously is known to cause ext4 corruption of the kind we saw on the other filesystems): [ 3932.271538] EXT4-fs (dm-7): ext4_check_descriptors: Block bitmap for group 404160 overlaps superblock [ 3932.271539] EXT4-fs (dm-7): group descriptors corrupted! I created a dumpe2fs file as well: crowsnest ~ # dumpe2fs /dev/lvm/home > /var/tmp/dump2fs_home.txt dumpe2fs 1.45.4 (23-Sep-2019) dumpe2fs: Block bitmap checksum does not match bitmap while trying to read '/dev/lvm/home' bitmaps Available at https://downloads.uls.co.za/85T/dump2fs_home.txt.xz (1.2GB, md5:79b3250e209c067af2532d5324ff95aa, around 12GB extracted) A strace of e2fsck -y -f /dev/lvm/home at https://downloads.uls.co.za/85T/fsck.strace.txt (13MB, md5:60aa91b0c47dd2837260218eb774152d) crowsnest ~ # tune2fs -l /dev/lvm/home tune2fs 1.45.4 (23-Sep-2019) Filesystem volume name: <none> Last mounted on: /home Filesystem UUID: 522a9faf-7992-4888-93d5-7fe49a9762d6 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr filetype meta_bg extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 356515840 Block count: 22817013760 Reserved block count: 0 Free blocks: 6874204745 Free inodes: 202183498 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 512 Inode blocks per group: 32 RAID stride: 128 RAID stripe width: 1024 First meta block group: 2048 Flex block group size: 16 Filesystem created: Thu Jul 26 12:19:07 2018 Last mount time: Sat Jan 18 18:58:50 2020 Last write time: Sun Jan 26 11:38:56 2020 Mount count: 2 Maximum mount count: -1 Last checked: Wed Oct 30 17:37:27 2019 Check interval: 0 (<none>) Lifetime writes: 976 TB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 876a7d14-bce8-4bef-9569-82e7d573b7aa Journal backup: inode blocks Checksum type: crc32c Checksum: 0xfbd895e9 Infrastructure: 3 x RAID6 arrays, 2 of 12 x 4TB disks, and 1 of 4 x 10TB disks (100TB usable total). These are combined into a single VG using LVM, and then carved up into a number of LVs, the largest of which is this 85TB chunk. We have tried in the past to carve this into smaller LVs but failed. So we're aware that this is very large and not ideal. We did experience an assembly issue on one of the underlying RAID6 PVs, those have been resolved, and the disk that was giving issues has been scrubbed and rebuilt. rom what we can tell based on other file systems, this did not affect data integrity but we can't make that statement with 100% certainty, as such we are expecting some data loss here but it would be better if we can recover at least some of this data. Other filesystems which also resides on the same PV that was affected by the RAID6 problem either received a clean bill of health, or were successfully repaired by e2fsck (the system did crash however, it's unclear whether the RAID6 assembly problem was the cause or merely another consequence, and as a result, whether the corruption on the repaired filesystem was a consequence of the kernel or the RAID). I'm continuing onwards with e2fsck code to try and figure this out, am hopeful though that someone could perhaps provide some much needed insight and pointers for me. Kind Regards, Jaco