Re: [long] major problems on fs; e2fsck running out of memory

Keith Keller <kkeller@xxxxxxxxxxxxxxxxxxxxxxxxxx> · Sun, 1 Jun 2014 19:43:12 -0700

Hi Bodo and Ted,

Thank you both for your responses; they confirm what I thought might be
the case.  Knowing that I can try to proceed with your suggestions.  I
do have some followup questions for you:

On Sun, Jun 01, 2014 at 09:05:09PM -0400, Theodore Ts'o wrote:
> Unfortunately, there has been a huge number of bug fixes for ext4's
> online resize since 2.6.32 and 1.42.11.  It's quite possible that you
> hit one of them.

Would this scenario be explained by these bugs?  I'd expect that if a
resize2fs failed, it would report a problem pretty quickly.  (But
perhaps that's the nature of some of these bugs.)

> Well, actually it's not quite that simple.  There are multiple passes
> to e2fsck, and the first pass is estimated to be 70% of the total
> e2fsck run.  So 51.8% reported by the progress means e2fsck had gotten
> 74% of the way through pass 1.  So that would mean that it had got
> through about inodes associated to about 3.9TB into the file system.

Aha!  Thanks for the clarification.  That's certainly well more than the
original fs size.

> That being said, it's pretty clear that portions of the inode table
> and block group descriptor was badly corrupted.  So I suspect there
> isn't going to be much that can be done to try to repair the file
> system completely.  If there are specific files you need to recover,
> I'd suggest trying to recover them first before trying to do anything
> else.  The good news is that probably around 75% of your files can
> probably be recovered.

So, now when I try to mount, I get an error:

# mount -o ro -t ext4 /dev/mapper/vg1--sdb-lv_vz /vz/
mount: Stale NFS file handle

That's clearly a spurious error, so I checked dmesg:

# dmesg|tail
[159891.219387] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42252 failed (36703!=0)
[159891.219586] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42253 failed (51517!=0)
[159891.219786] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42254 failed (51954!=0)
[159891.220025] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42496 failed (37296!=0)
[159891.220225] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42497 failed (31921!=0)
[159891.220451] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42498 failed (2993!=0)
[159891.220650] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42499 failed (59056!=0)
[159891.220850] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 42500 failed (28571!=22299)
[159891.225762] EXT4-fs (dm-0): get root inode failed
[159891.227436] EXT4-fs (dm-0): mount failed

and before that there are many other checksum failed errors.  When I
try a rw mount I get these messages instead:

[160052.031554] EXT4-fs (dm-0): ext4_check_descriptors: Checksum for group 0 failed (43864!=0)
[160052.031782] EXT4-fs (dm-0): group descriptors corrupted!

Are there any other options I can try to force the mount so I can try to
get to the changed files?  If that'll be challenging, I'll just sacrifice
those files, but if it'd be relatively straightforward I'd like to make
the attempt.

Thanks again!

--keith

-- 
kkeller@xxxxxxxxxxxxxxxxxxxxxxxxxx

_______________________________________________
Ext3-users mailing list
Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users