Re: ext4 superblock checksum invalid after running resize2fs

Baokun Li <libaokun1@xxxxxxxxxx> · Tue, 3 Jan 2023 10:37:39 +0800

On 2023/1/3 8:35, Zsolt Murzsa wrote:
Hi!

I've had the same issue with twice in the last couple of days with the resize2fs online expand function.
I have a md raid 1, with an LVM volume, which is formatted with ext4. I resized the volume (from 4T to 5T), then I ran resize2fs, which ran without error, the file system got bigger.

After a few hours, I reset the machine (unsafely), due to some zombie processes, but after restarting, the system could not mount the filesystem.
I checked the disks, and ran some hardware checks, but I didn't find anything wrong. I thought the hard reset caused some problem.

That was the problem: "Superblock checksum does not match superblock". I tried several superblocks, e2fsck, testdisk, but nothing helped, dumpe2fs showed all the data about the superblock.
I started to restore from a backup.

In the meantime, I found the debugfs tool, with which I could skip the checksum check and thus see all the folders and files that I restored to a separate disk.
I replaced the two drives, recreated md RAID 1, LVM, then reformatted with ext4, started copying the data back.

I ran out of space so expanded the LV and ran resize2fs again (from 3T to 5T). It ran successfully again, the attached file system is 5T.
Then I ran an e2fsck.

"e2fsck -n /dev/vg1/data
e2fsck 1.46.5 (30-Dec-2021)
Warning!  /dev/vg1/data is mounted.
ext2fs_open2: Superblock checksum does not match superblock
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Superblock checksum does not match superblock while trying to open /dev/vg1/data

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
     e2fsck -b 8193 <device>
or
     e2fsck -b 32768 <device>"

I'm shocked it happened again.
I can currently write / read the files, but it is suspicious that I will not be able to mount the filesystem again.
In the first case, I couldn't find a simple solution, but is it possible to fix the checksum somehow?
It takes a lot of time to use debugfs to copy everything to another drive and back again.

My current kernel version: 5.19.17-1-pve.
I can attach all the superblocks (Both the first and second case), or any other information, if needed.

Best Regards,
Zsolt Murzsa

Hi Zsolt,

Maybe this patch on the mainline has fixed your problem:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a408f33e895e455f16cf964cb5cd4979b658db7b

--
With Best Regards,
Baokun Li
.