Re: e2fsck fails with unable to set superblock

Jaco Kroon <jaco@xxxxxxxxx> · Thu, 30 Jan 2020 12:55:22 +0200

Hi Ted,

On 2020/01/29 22:50, Theodore Y. Ts'o wrote:
> On Wed, Jan 29, 2020 at 06:35:41AM +0200, Jaco Kroon wrote:
>> Hi,
>>
>> Inode 181716301 block 33554947 conflicts with critical metadata,
>> skipping block checks.
>>
>> So the critical block stuff I'm guessing can be expected since a bunch
>> of those tree structures probably got zeroed too.
> It's possible that this was caused by the tree structures getting
> written with garbage (33554947 is not zero, so it's not the extent
> tree structure getting zeroed, by definition).  If metadata checksums
> are enabled, then the kernel would notice (and flag them with EXT4-fs
> error reports) if extent trees were not correctly set up.
>
> Another possibility is that hueristics you used for guessing how to
> recontrust the block group discripts were incorrectly.  Note that if
> the file system has been grown, using on-line or off-line resize2fs,
> the results may not be identical to how the block groups laid out by
> mke2fs would have done things.  So trying to use the existing pattern
> of block group descriptors to reconstruct missing ones is fraught with
> potential problems.
So my code did some extra work in that it regenerated existing ones too,
and the only issues it picked up was with those GDs which was "all
zero".  So I'm fairly confident that it's OK what I've done.  The
descriptions on the links I've previously posted made more and more
sense as I re-read them a few times and were spot on with what was found
on disk for non-damaged GDT blocks.  Other than bg_flags ... which
Andreas explained quite well.
>
> If the file system has never been resized, and if you have exactly the
> same version of e2fsprogs used to initially create the file system,
> and if you have the exact same version of /etc/mke2fs.conf, and the
> exact same command-line options to mke2fs, you might be able to use
> "mke2fs -S" (see the mke2fs manpage) to rewrite the superblock and
> block group descriptors.  But if any of the listed assumptions can't
> be assured, it's a dangerous thing to do.

It has, a few times, always online.  Generally in increments of 1TB at a
time. I can't remember all the arguments and stuff though, and I have
definitely upgraded e2fsprogs in the meantime.

Hehehe, dangerous at this point in time is an option compared to
reformatting and definitely losing all the data I can only win.  And LVM
snapshots are helpful w.r.t. being able to roll back, but it can't get
worse than "complete data loss" which is where I'm currently at.

>
>> Another idea is to use debugfs to mark inode 181716301 as deleted, but
>> I'm not sure that's safe at this stage?
> Well, you'll lose whatever was in that inode, but it's more likely
> that the problem is that if the block group descriptors are incorret,
> you'll cause even more damage.
>
> Did you make a full image backup of the good disks you can revert any
> experiments that you might try?

LVM snapshot yes.  Don't have 85T just lying around elsewhere to dd onto.

>
> Good look,
>
> 					- Ted
>
> P.S.  For future reference, please take a look at the man page of
> e2image for how you can back up the ext4's critical metadata blocks.
>
This is great!  I'll definitely add that to my bag of tricks. 
Especially for this particular server which houses most of our backups
for other hosts.

Kind Regards,
Jaco