Re: ext3 on Linux software RAID1

Matti Aarnio <matti.aarnio@zmailer.org> · Sat, 2 Mar 2002 23:59:20 +0200

On Sat, Mar 02, 2002 at 12:18:44PM -0500, Rechenberg, Andrew wrote:
... 
> We just had a pretty bad crash on one of production boxes and the ext2
> filesystem on the data partition of our box had some major filesystem
> corruption.  Needless to say, I am now looking into converting the
> filesystem to ext3 and I have some questions regarding ext3 and Linux
> software RAID.
>
> I have read that previously there were some issues running ext3 on a
> software raid device (/dev/mdN), but that most of those issues are resolved
> by running kernel 2.4.x.  Currently we are running 2.4.16 on our producton
> system and we have a rather complicated hardware/software RAID configuration
> on the box.

To a large extent what is called "ext3" should probably be called "ext2+j",
or some such, but "ext3" makes sense for many things.

You have probably seen my comments.

I did "solve" the problem by running my dual-cpu box with uniprocessor
kernel.  Even  a SMP kernel with "nosmp"  boot option (to run it with
only the boot processor) didn't work.

My troublesome machine has been co-located into a place at which I don't
have daily access.  

How much the problem lies in the gcc version (RH 7.1 2.96-97.1), and
how much in the kernel (2.4.17 release), I don't know.

It seems to happen when there is:
  - lots of memory
  - at least 2 _fast_ processors
  - process writing at full tilt a very large file
    (1.3+ times the memory size)

I didn't ever see the problem with 128 MB Dual PPro200 machine
even with same disks that latter with another motherboard/cpus
did cause problems.

I haven't seen the problem with my home machine which is,
if possible, even heftier box that the one which I do get
to hang -- except it has dual IDE disks at same IDE cable,
instead of a separate dual channel IDE controller or SCSI
controller.   (Both of which have hung up on me at otherbox.)

...
> - Is it wise to convert the filesystem on /dev/md1 to ext3? 
> 
> - Have the issues with ext3 on Linux RAID been resolved?

  Things I have seen in 2.4.18 test releases don't convince me.
  However I haven't been able to test them either.  Maybe in
  couple weeks time, but not yet.

> - Will the failing and resyncing of /dev/md1 happening on a daily basis
>   cause problems with the journalling? 

  It should be invisible.  Doing it at a quiet moment would be advisable,
  of course.   Running the resync will take heaps of time, though.

  Consider 252 GB being synced at 15 MB/sec (even that might be a bit
  over optimistic with some disks), it takes "mere" 16800 seconds,
  or 4h 40m.

  Heaps of smarter internal design to support a real RAID10 (instead
  of RAID1 on top of a pair of RAID0) might allow syncing all disks in
  parallel, which of course reduces the max speed, but might achieve
  the sync in about an hour, or two.

> - Do you think the filesystem would be stable enough for 18x7 availability?

  EXT3, yes.  But likely you should ask RedHat for supported Enterprise
  kernel.    There are other reasons why ReiserFS might make sense 
  in your system, but unfortunately they don't help with backup...

  ..  or maybe...  See the site:  http://devlinux.com/namesys

  If the reiserfs can do snapshot at filesystem level, and you
  take backup at filesystem level..  Then you can have heaps of
  small RAID1 pairs stripped together with RAID0 -- e.g.  "RAID01"
  (or which ever way those hybrides are called..)

  ... but still, those don't help in case your system experiences
  the RAID1+EXT3 hangup which I kept seeing.

  When you have time, try it.   "tune2fs -j  /dev/md1"
  (Have suitable toolset online also), e2fsck, and mount with "-t ext3".

  Then try writing there a large file, e.g.:
	dd if=/dev/zero bs=1024k of=test.file  count=12000

  If it does not hang with a few runs, you probably are safe.

> - What kind of overhead is involved after the filesystem is ext3?

  Aside of the journal file, it is exactly the same as EXT2.
  Indeed you can  tune2fs   a EXT2 filesystem to be EXT3.
  I have done that.   It takes a bit to have the root to
  mutate itself in boot to ext3, for some reason.  But you
  are not doing this for your boot system..

  Difficulty appears only with ext2 filesystem that has been
  in use for a longer time with older kernels.

> - What journalling mode is suggested for this type of application/system
>   configuration?
> - What size journal would be appropriate give data=ordered vs. data=journal?
> - And any other suggestions/insights/comments.

  For these I don't have opinnions.
  I have been using "save and slow" mode (data=ordered), but I am not
  in a very great hurry most of the time.

  Indeed presently one of my remotely located machines has been running
  RAID1 in degraded mode because 2 weeks ago one of its IDE disks did
  fail, and I noticed it only yesterday (lacking automated monitoring.)
  It seems I can get to it in a weeks time, but will it need replacing,
  or just a powercycle, no idea...

> Below is our /etc/raidtab.  Let me know if you need any more information.
> Thank you in advance for all your assistance.
> 
> Regards,
> Andrew Rechenberg
> Network Team, Sherman Financial Group
> arechenberg@shermanfinancialgroup.com

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html