Re: ext4 won't mount - fsck required - 2nd fsck in less than a week

Lukáš Czerner <lczerner@xxxxxxxxxx> · Tue, 11 Sep 2012 13:59:47 -0400 (EDT)

On Tue, 11 Sep 2012, Terry wrote:

> Date: Tue, 11 Sep 2012 11:22:27 -0500
> From: Terry <td3201@xxxxxxxxx>
> To: Theodore Ts'o <tytso@xxxxxxx>
> Cc: linux-ext4@xxxxxxxxxxxxxxx
> Subject: Re: ext4 won't mount - fsck required - 2nd fsck in less than a week
> 
> On Mon, Sep 10, 2012 at 8:56 AM, Terry <td3201@xxxxxxxxx> wrote:
> > On Mon, Sep 10, 2012 at 8:48 AM, Terry <td3201@xxxxxxxxx> wrote:
> >> On Sun, Sep 9, 2012 at 10:18 PM, Terry <td3201@xxxxxxxxx> wrote:
> >>> On Sun, Sep 9, 2012 at 9:53 PM, Terry <td3201@xxxxxxxxx> wrote:
> >>>> On Sun, Sep 9, 2012 at 9:47 PM, Theodore Ts'o <tytso@xxxxxxx> wrote:
> >>>>> On Sun, Sep 09, 2012 at 09:34:10PM -0500, Terry wrote:
> >>>>>>
> >>>>>> As the subject says, we have a 15 TB fsck drive that won't mount with
> >>>>>> these errors:
> >>>>>>
> >>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): ext4_check_descriptors:
> >>>>>> Inode bitmap for group 3200 not in group (block 4161027887)!
> >>>>>> Sep 9 20:02:20 narf kernel: EXT4-fs (dm-9): group descriptors corrupted!
> >>>>>
> >>>>> These indicate a very basic file system corruption where the block
> >>>>> group descriptors are corrupted.  E2fsck will complain immediately
> >>>>> upon seeing this sort of fs inconsistency, and the first thing it will
> >>>>> try to do is fix it.
> >>>>>
> >>>>>> We did a proactive fsck on Tuesday of last week because it was
> >>>>>> starting to give filesystem errors. It ran through and mounted fine.
> >>>>>>
> >>>>>> The filesystem lives on an equallogic SAN spread across 36 drives.
> >>>>>> Could this be something with the physical layer or is it not abnormal
> >>>>>> to have to run multiple rounds of fsck to fully fix an issue?
> >>>>>
> >>>>> This is most probably a hardware problem; normally e2fsck will fix
> >>>>> file system corruptions (and certainly problems such as corrupt block
> >>>>> group scriptors) in a single pass.  If e2fsck finished and the file
> >>>>> system mounted fine last week, and now you're getting this kind of
> >>>>> error, it basically screams some kind of physical layer problem, or
> >>>>> perhaps a bad hard drive, or perhaps the SAN disk is getting
> >>>>> incorrectly written to by some other system, etc.
> >>>>>
> >>>>>                                      - Ted
> >>>>
> >>>> Thanks for the reply.  It is part of a RHEL cluster but we did not
> >>>> have any situations where multiple systems mounted the filesystem.  It
> >>>> is a an old SAN so perhaps we have a physical issue. We'll see what it
> >>>> happens with this pass.
> >>>
> >>> While I am waiting for fsck to finish, another thought. This
> >>> filesystem contains a lot of small files. 35,867,642 files to be
> >>> exact.  Anything else I should check or know to ensure a smooth
> >>> operation for these types of filesystems?  I formatted them with
> >>> standard RHEL 6 options.
> >>
> >> FSCK completed fixing a lot of things.  The file system then mounted
> >> without any errors.  We are still getting these types of errors in
> >> /var/log/messages:
> >>
> >> Sep 10 08:40:49 narf kernel: EXT4-fs error (device dm-6):
> >> ext4_dx_find_entry: bad entry in directory #743966900: directory entry
> >> across blocks - block=2975876794offset=0(946176), inode=1414751737,
> >> rec_len=45724, name_len=206
> >>
> >> Thoughts?
> >
> > Hold that thought.  This is another filesystem.  Let me fix that one
> > then come back to this problem if it still exists.
> 
> Ok, fixed the other filesystem (dm-6) yesterday.  Today, getting these
> errors still on it:
> Sep 11 11:17:47 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 90851: 0 blocks in bitmap, 5048
> in gd
> Sep 11 11:18:17 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 90670: 0 blocks in bitmap, 6665
> in gd
> Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 37589: 420 blocks in bitmap,
> 8302 in gd
> Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 71777: 7071 blocks in bitmap,
> 23711 in gd
> Sep 11 11:19:31 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 71778: 10664 blocks in bitmap,
> 26624 in gd
> Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 13499: 9884 blocks in bitmap,
> 1256 in gd
> Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 13498: 383 blocks in bitmap,
> 384 in gd
> Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 13496: 2356 blocks in bitmap,
> 10453 in gd
> Sep 11 11:19:39 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 13497: 3593 blocks in bitmap,
> 5641 in gd
> Sep 11 11:19:50 omadvnfs01a kernel: EXT4-fs error (device dm-6):
> ext4_mb_generate_buddy: EXT4-fs: group 49528: 25850 blocks in bitmap,
> 29946 in gd

Hi, what RHEL version are you using, or even better what kernel
version are you using ? If you have RHEL subscription, you should
definitely Red Hat about the issue.

Thanks!
-Lukas

> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html