External Journal scenario - good idea?

Vinnie <listacct1@lvwnet.com> · Wed, 30 Oct 2002 07:44:28 -0500

Hello everyone,

I've just recently joined the ext3-users list.  I spent much of the 
weekend browsing over list archives and other tidbits I could find on 
the net, regarding using an external journal, and running in 
data=journal mode.  From what I have seen looking around at what other 
folks are doing, data=journal with an external journal may be able to 
help our problem here.  

If I could pick the brains of the resident gurus for a moment, and 
solicit some advice, I thank everyone in advance who can take the time 
to offer their opinions.

We are running a file server, which currently has as its "hard drive" an 
ATA-to-SCSI external RAID subsystem.  The file server is a dual 
Pentium-III Tualatin 1.4GHz (512K cache) server, built on a Serverworks 
HESL-T chipset, with 2GB ECC Registered SDRAM.  

The RAID unit is a Promise UltraTrak100-TX8, with 8 Western Digital 
WD1200JB 120GB ATA100 7200rpm hard drives installed.  7 of the 8 drives 
are joined to a RAID5 array, the 8th is an unassigned hot spare.  The 
UltraTrak's SCSI interface is an Ultra2-LVD (80MB/sec) interface, 
connected via its external 68-pin MicroD cable, to a custom Granite 
Digital internal-to-external "Gold TPO" ribbon cable - which leads to 
the "B" channel of the onboard AIC7899W Ultra160 SCSI interface.  The 
RAID unit is the only SCSI device attached to this channel at this time, 
and is terminated with a Granite Digital SCSI-Vue active diagnostic 
terminator.  I have no indication or suspicion whatsoever of any SCSI 
bus problems.  (I have also run same UltraTrak unit with same diag 
terminator to an AHA2940U2W in the "old" file server, with same write 
performance issues, to be described below).

Currently, the array is partitioned with a /boot partition, and a / 
partition, each as ext3 with the default data=ordered journaling mode. 
 I have begun to realize gradually why it is a decent idea to break up 
the filesystem into separate mount points and partitions, and may yet 
end up doing that.  But that's a rabbit to hunt another day, unless 
taking care of this is also required to solve this problem.

This file server performs 5 key fileserver-related roles, due to its 
having the large RAID5 file storage for the network:

1. Serves the mailboxes for our domain to the two frontend mail/web 
servers via NFS mount

2. Runs the master SQL server - the two mail/web servers run local slave 
copies of the mail account databases

3. Stores the master copy of web documents served by the web servers 
(and will replicate them to web servers when documents change, still 
working on this though)

4. Samba file server for storage needs on the network

5. Limited/restricted-access FTP server for web clients

For the most part, the file server runs great and does its job quite 
well.  However there are two main circumstances in which things run 
quite poorly to "go downhill":

1. Daily maintenance-type cron events (like updatedb)

2. Other heavy file WRITE activity, such as when Samba clients are 
backing up their files to this server from the network.  We regularly 
have some very large files being copied over to the file server via 
Samba (1 GB drive image files, for example)

In both cases, or other cases of heavy file I/O (mainly writes), this 
server pretty much grinds to a halt.  It starts grabbing up all of the 
available RAM to use as dcache, presumably because the RAID unit cannot 
write it to disk that fast.  The inevitable is stalled as long as 
possible, but eventually the backlog uses up all available system RAM 
(we have 2GB in this puppy now), until it is forced to write 
synchronously to free up some dcache for fresh data coming in.  While 
this is going on, might as well forget delivering/retrieving an email 
to/from mailboxes, or getting much anything else out of the server.  We 
have seen "NFS Server Not Responding" errors, and MySQL errors too (from 
the vpopmail libs trying to look up the username/pw and mailbox location).

Once the "emergency/panic" sync writing to disk is complete, the server 
reverts back to running great (although linux never seems to de-allocate 
RAM it has grabbed for dcache -- that is until it absolutely HAS to give 
it up).

 From what I've been reading this seems to be normal for 2.4-series 
kernels (I'm running a modded 2.4.18 on this server, patched with the 
various NFS suite of patches, plus recent iptables), it seems to really 
like to use RAM for cache.  And I suppose that RAM works better doing 
SOMETHING, than just sitting there looking pretty under the available 
column. ;)

I also understand that RAID5 is not known for its great writing 
performance.  Add to that running an ext3 filesystem, which does add 
some overhead to it for the extra work.

We really need to solve this problem.  We're also seeing "NFS Server not 
responding" errors in the logs every day during maintenance runs, and 
pretty much any other time heavy disk activity is going on, so mail 
performance is being affected.  Mail users get username/pw errors (it 
even tells them it couldn't contact MySQL update server sometimes).

It's definitely not a server horsepower problem. ;)  But I can see where 
it could be a write speed issue with the RAID unit.  Unless this is just 
the way the linux kernel does things (which I am afraid may be the case).

THOUGHTS ABOUT USING AN EXTERNAL DATA=JOURNAL SETUP
After reading many posts in the archives here and other things I could 
find, I have considered setting up a separate pair of quick drives in a 
RAID1 array as an external journal, and setting DATA=JOURNAL mode on the 
root filesystem mount.

This strikes me as a possible write performance improver, if doing so 
will allow the larger writes to be "satisfied" faster because they only 
have to be written to the journal drive pair, without all the overhead 
of having to write to the RAID5 array.  I realize that the data still 
has to be written to the main filesystem on the RAID5 array, and that 
this will actually cause more work.  I'm just wondering if the journal 
updating to the actual filesystem is more of a background thing which 
does not affect the responsiveness of the file server.  We would 
probably make the journal size close to the full size of the RAID1 array 
(40GB?)

Does this seem like a viable option to improve or eliminate the server 
responsiveness problems?  Or do any of the gurus out there have any 
better suggestions?  We can't fit an NVRAM-based external journal device 
in the budget.

CAN WE CHANGE JOURNAL LOCATION ON EXISTING EXT3 PARTITIONS?
One other snag it seems we may run into is the fact that the / partition 
already has a journal (/.journal, I presume), since it's already an ext3 
partition.  Is it possible to tell the system we want the journal 
somewhere else instead?  Strikes me that when we're ready to move to the 
external journal, we may have to mount the / partition ext2, then remove 
the journal, and create the new one and point the / partition to it with 
the e2fs tools?

Thanks in advance for all thoughts, opinions, and suggestions.  I'll 
provide whatever other details necessary.

Thanks in advance,
vinnie

_______________________________________________

Ext3-users@redhat.com
https://listman.redhat.com/mailman/listinfo/ext3-users