On Sat, Apr 16, 2022 at 12:37:29AM +1000, Peter Urbanec wrote: > # e2fsck -f -C 0 -b 32768 -z /root/20220415_2015_e2fsck-b_32768.e2undo > /dev/md0 > e2fsck 1.46.4 (18-Aug-2021) > Overwriting existing filesystem; this can be undone using the command: > e2undo /root/20220415_2015_e2fsck-b_32768.e2undo /dev/md0 > > e2fsck: Undo file corrupt while trying to open /dev/md0 > > The superblock could not be read or does not describe a valid ext2/ext3/ext4 > filesystem. If the device is valid and it really contains an ext2/ext3/ext4 > filesystem (and not swap or ufs or something else), then the superblock > is corrupt, and you might try running e2fsck with an alternate superblock: > e2fsck -b 8193 <device> > or > e2fsck -b 32768 <device> So the failure of "e2fsck -f -C 0 -b 32768 -z /root/e2fsck.e2undo /dev/md0" appears to be a bug where e2fsck doesn't work correctly with an undo file when using a backup superblock. I can replicate this using these commands: mke2fs -q -t ext4 /tmp/foo.img 2G e2fsck -b 32768 -z /tmp/undo /tmp/foo.img Running e2fsck without the -z option succeeds. The combination of the -b and -z option seems to be broken. As a workaround, I would suggest doing is to try running e2fsck with -n, which will open the block device read-only, e.g. "e2fsck -b 32768 -n /dev/mdXX". If the changes e2fsck look safe, then you can run e2fsck without the -n option. As far as what happens, I wasn't able to replicate the problem when resizing an empty file system from 3906946560 to 5860419840 blocks, using a 64-bit binary. I've stopped testing 32-bit builds quite a while ago, and filling the file system would take more time than I have at the moment. I will note that 3906946560 fits in 32-bits, while 5860419840 is larger than 2**32. So it could very much be some kind of a 32-bit block number overflow. For better or for worse, I don't have the resources or time to test the full set of combinations of file system features, and as I mentioned, I don't test 32-bit builds any more, either. Enterprise distributions will provide paid support, but they tend to only support a limited number of file system features, and not the full set of combination of features. While I'm grateful that Gentoo users seem to be super adventurous in terms of turning every single feature they can find, and then send in bug reports so we can improve the case base --- there are reasons why features like sparse_super2 and inline_data are not enabled by default, and I feel bad when people turn on non-standard features, and then lose data because they aren't doing backups and they enable features that haven't received as much testing as the default features. So I don't know if the problemw as due to some kind of bug in resize2fs caused by the use of 64-bit block numbers on a 32-bit binary, or due to the enablement of the sparse_super2 feature. But for now, let's see if we can get recover your file system, hopefully with minimal data loss. And since the using an undo file seems to be problematic with running e2fsck -b, the alternatives are (a) do a full block level backup of the file system before running e2fsck --- which I know will be hard since this is a very large file system, and (b) using e2fsck -n first and looking at what e2fsck would do first. I will look at why "e2fsck -b 32768 -z /tmp/foo.undo /tmp/foo.img" fails, but I may not get to it in a week or two. Cheers, - Ted