ext4 problem (Group descriptor checksum invalid)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have a recurring problem that I've run into a few times now. Every time it seems to be fixed but then later turns up again, so I figured I would check here if anyone knows of a permanent fix or whether it is perhaps caused by a bug somewhere in the ext4 code. Sorry in advance for the lengthy writeup but I figured I should try to provide all the details as I'm not sure what of it is relevant.

I have a software raid 5 array that originally was created with just 3 disks, each 1TB large. On this array I created an ext4 filesystem using

mke2fs -t ext4 -b 4096 -E stride=16 /dev/md2

I then grew the array (while mounted and in use) by doing

mdadm --add /dev/md2 /dev/sdX1
mdadm --grow /dev/md2 --raid-devices=4 --backup-file=/root/mdadm_grow_backup

After waiting for completion I grew the filesystem as well (still mounted and in use)

resize2fs -p /dev/md2

This all went well and after everything was completed I unmounted and did an e2fsck -f /dev/md2 which reported no problems. I repeated the growing process twice more so that I now have 6 1TB disks in the array. After the 2nd growing & resizeing I got an error from e2fsck, it was complaining that "Group descriptor 0 checksum is invalid", repeated for every group descriptor number. After it was fixed by e2fsck everything mounted fine though. The final grow & resize did not generate the error.

Now I often (but not always) seem to get that same error again when I reboot my server. During boot there will be a complaint from mount that /dev/md2 is the wrong fs type or something similar (sorry, didn't capture the exact error), and then I have to run e2fsck manually to get it fixed and mounted. The following was reported in the log today when I had my most recent occurance of the problem:

----
Jul 28 08:58:39 deimos EXT4-fs: ext4_check_descriptors: Block bitmap for group 9088 not in group (block 3632981051)!
Jul 28 08:58:39 deimos EXT4-fs: group descriptors corrupted!
----

I did e2fsck manually:


----
deimos ~ # e2fsck /dev/md2
e2fsck 1.41.3 (12-Oct-2008)
e2fsck: Group descriptors look bad... trying backup blocks...
Group descriptor 0 checksum is invalid.  Fix<y>? yes

Group descriptor 1 checksum is invalid.  Fix<y>? yes

Group descriptor 2 checksum is invalid.  Fix<y>? yes

Group descriptor 3 checksum is invalid.  Fix<y>? yes

...
----

I've seen this before, so I add -y to the e2fsck...

----
...


Group descriptor 37258 checksum is invalid.  Fix? yes

Group descriptor 37259 checksum is invalid.  Fix? yes

Group descriptor 37260 checksum is invalid.  Fix? yes

/dev/md2 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
----

At this point my terminal was flooded with output, but what I can see in my 20k lines scrollback is a whole bunch of:

----
Free blocks count wrong for group #30621 (32254, counted=1912).
Fix? yes

Free blocks count wrong for group #30622 (32254, counted=1625).
Fix? yes

Free blocks count wrong for group #30623 (32254, counted=1849).
Fix? yes

Free blocks count wrong for group #30624 (32254, counted=1456).
Fix? yes
----

Followed by some of these:

----
Free inodes count wrong for group #96 (734, counted=1159).
Fix? yes

Directories count wrong for group #96 (826, counted=836).
Fix? yes

Free inodes count wrong for group #97 (5647, counted=6852).
Fix? yes

Directories count wrong for group #97 (117, counted=86).
Fix? yes
----

e2fsck finally completed:

----
/dev/md2: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md2: 14929/305242112 files (85.3% non-contiguous), 1149206734/1220949920 blocks
deimos ~ # mount /dev/md2
deimos ~ #
----

Filesystem mounted, everything looks fine and as on the previous times I've had the problem it seems like I've had no data loss (I hope that is true, at least I've not noticed any missing or corrupted files).

Now the question I have is, what is causing this. Is this a known problem that is already fixed? What should I do to avoid running into this in the future? Was it something that was caused by resize2fs and then never properly fixed by the e2fsck runs which is the reason it keeps popping up again?

Some versions:

----
deimos ~ # uname -a
Linux deimos 2.6.29-gentoo-r5 #2 SMP Wed Jun 17 20:55:58 CEST 2009 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
deimos ~ # mdadm --version
mdadm - v2.6.8 - 28th November 2008
deimos ~ # e2fsck -V
e2fsck 1.41.3 (12-Oct-2008)
        Using EXT2FS Library version 1.41.3, 12-Oct-2008
----

I hope there is some resolution for this, even though it seems like I get the FS back every time without data loss it is still a bit scary. Thanks in advance for any help, and let me know if there is more data I should provide.

BR,

/Fredrik Pettersson
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux