Re: ext3 filesystem corruption - more info

Sev Binello <sev@xxxxxxx> · Wed, 12 Apr 2006 20:20:48 -0400

Hi -

I did answer but forgot to cc all, apologies.

Here's the communication I had with Andreas Dilger ....

   Andreas Dilger wrote:

      On Apr 11, 2006  14:25 -0400, Sev Binello wrote:

          Does this imply you have a 6TB ext3 filesystem?

         No, it is divided into 6 filesystems, the largest ~ 1.8TB

I wouldn't exactly trust the 2.4 kernel for devices larger than 2TB.
Some SCSI drivers also had problems over 2TB due to signed/unsigned
issues.  Is the 6TB of storage split into < 2TB LUNs by the hardware,
or is it a single 6TB block device (with CONFIG_LBD) that is partitioned
by Linux?  The latter case would be in "not very well tested" waters.

The 6TB are split on the raid hardware into 6 LUNS.

So Linux sees them as devices smaller than 2TB,

Would it be a problem if the two 1.8TB systems appeared on one host ?

Thanks

-Sev

Damian Menscher wrote:
I've seen similar errors when attempting to have a >2TB
filesystem on a 32-bit RHEL3 machine.  We have since implemented a
3.5TB filesystem on a 64-bit RHEL4 machine.

It would help if you could answer the question Andreas Dilger posed:

"Does this imply you have a 6TB ext3 filesystem?"

Damian

On Wed, 12 Apr 2006, Sev Binello wrote:

Hi -

In case this helps,

we got the following messages from EXT3 before the filesystem went

Does anyone recognize these.....

//seems to mount okay

   Mar 25 17:52:30 acnlin82 kernel: EXT3 FS 2.4-0.9.19, 19 August 2002
on sd(8,33),

internal journal

   Mar 25 17:52:30 acnlin82 kernel: EXT3-fs: recovery complete.

   Mar 26 00:04:01 acnlin82 kernel: EXT3-fs: mounted filesystem with
ordered data

mode.

//soon as nfs clients start get a TON of errors like this

Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks:

Freeing blocks not in datazone - block =    3443589120, count = 1

Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks:

Freeing blocks not in datazone - block = 2113834232, count = 1

Mar 26 00:07:22 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks:

bit already cleared for block 49125

//interspersed with some of these

Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device

Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1891463980,
limit=1722264358

Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device

Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1824250576,
limit=1722264358

Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device

Then we had to reboot and basically filesystem is shot

Thanks

-Sev

Sev Binello wrote:

      Hi -

         We have had 3 rather major occurances of ext3 filesystem
corruption

      lately,

         i.e. so bad we couldn't event mount, and fsck didn't help.

         I am looking for pointers, that could help us investigate the
root

      cause.

         In general...

           We are running  RedHat WS 3 Update 6,   2.4.21-40.2.ELsmp or

      2.4.21-37.ELsmp

         We have a small SAN  system that looks like this

                    3 NFS servers each containing 2 Qlocic hba's
connected to 2

      qlogic switches

               connected to an nstor (now xyratex) 6TB raid system
containing 2

      (active-active) controllers.

       On the first 2 occasions one of the controllers was failed over.

       On a 3rd occasion both SAN  switches lost power, and the hosts
and raid

      lost communication.

       On all occasions the qlocic failover driver tried to start up on
the

      alternate HBA.

       On the first 2 instances we sort of tried to blame the
controller.

       On the 3rd, that was harder to do since the raid system and the
hosts

      stayed up

       but lost communication.

       I can provide more detail if anyone as any info on how to
proceed.

      Thanks

      -Sev

 -- 

Sev Binello

Brookhaven National Laboratory

Upton, New York

631-344-5647

sev@xxxxxxx

Damian Menscher

-- 

Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
sev@xxxxxxx

_______________________________________________

Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users