Re: External Journal scenario - good idea?

Vinnie <listacct1@lvwnet.com> · Sun, 10 Nov 2002 18:06:01 -0500

 > Hope things go well,
 >
 > Cheers
 > Jeremy
 >
 >
 >
 > _______________________________________________
 > 
 > Ext3-users@redhat.com
 > https://listman.redhat.com/mailman/listinfo/ext3-users
 >
 >
 >

Hi Jeremy,

Just wanted to take a minute to say thanks for all your advice with our 
situation.

It ended up being a little bit of everything, our problem:

1. SPLIT UP THE FILESYSTEM
Originally, the whole filesystem (minus /boot) was on one huge partition 
(and therefore one really busy journal).  We split up the filesystem 
like this:

/ 
	\
/boot 
	 \
/tmp		  \
/usr			Separate ext3 partitions on external RAID unit
/var 
	  /	with symbolic links pointing /home and a few
/var/log 
  /	non-standard directory names to go under
/storage1 
/ 
/storage1 - our large file area, so they
			don't have to fit under the / filesystem.

/intraid1 
	Separate Linux Software mdp-style (Neil Brown
			patch to make partitionable arrays of drives)
			RAID1 pair of internal drives for the NFS-served
			mailboxes, and soon-to-be MySQL database.  This
			is running in data=journal mode, to cause NFS
			sync to be satisfied when data is written to the
			journal.

Originally I did split off /usr/local as a separate mount point/fs and 
put all the big storage dirs under that.  But then I realized that there 
were several things under /usr/local (like Apache) that I didn't really
want being stuck in a busy filesystem.  The /usr/local tree was only 
taking up 75MB anyway, so I put it back under /usr, and remounted the
big partition as /storage1.  (the MySQL RPM pkg puts the database under
/var/lib/mysql, that is, until I customize the spec file and put it 
where I want it to go, under /ideraid1).

Splitting up the filesystem made a HUGE difference in write performance 
and file access in general.  The system could actually pay attention to 
the /proc/sys/vm/bdflush tweeks, and the server doesn't peg itself 
anymore trying to write everything to disk.

I tested RAID1, RAID0, RAID0+1, and RAID5 arrays on the external unit. 
Write performance was better, particularly in how much bus activity it 
took to write the data to the external array, with RAID 0, 1, and 0+1 
arrays, but something I found along the way made RAID5 write performance 
on big write operations a lot better too... (and may be why my 2.4.19 
kernel build crashed so miserably on me, same types of bus errors and 
stuff)...

2. SCSI BIOS "ALLOW DISCONNECT" SETTING FOR RAID UNIT'S SCSI ID
My Promise UltraTrak100-TX8 doesn't like having SCSI disconnect enabled 
for its SCSI ID in the controller BIOS (Adaptec 7899 U160 onboard). 
Since this is the only SCSI device on the bus at this time, I disabled 
"allow disconnect" for its SCSI ID, and found that it allowed mke2fs to 
complete writing the inode tables on the large partition.  Until I did 
this, the SCSI bus would hang, reset, and otherwise act ugly.  Would 
really be a bummer if this unit can't work well with "allow disconnect" 
enabled, because if (WHEN) I try to add more SCSI devices to this 
channel, disabling disconnect for the RAID unit will likely not be an 
option.  Guess I better keep that AHA2940U2W around... ;)  I noticed the 
same heavy bus activity light with the 2940U2W on the old server, but 
since this Promise unit is only Ultra2-LVD (80) anyway, it would be 
better to put this thing on its own bus if/when I put U160 SCSI drives 
in the server.

3. NFS EXPORTS ON SEPARATE SPINDLES
Just doing the above two things straightened things out enough to allow 
the server to continue serving the NFS mailbox export well, even with 
one of us copying large sets of files (big and little) to the server. 
The way this panned out, we had not yet put the mailboxes on separate 
spindles.  But we were already in much better shape.

During "torture testing" (BOTH of us copying huge sets of files to the 
server at the same time, while also sending and retrieving emails with 
large attachments), NFS mailbox service was pretty bad.  The 
mail/webserver frontends had no problem queueing the mesages on their 
local delivery queue, but couldn't hit the NFS mounts to deliver them to 
the mailboxes.  Little messages could get through, but big ones were "no 
dice".  Webmail was a joke, since the frontends couldn't hit the mailboxes.

We never both copy files to the file server at the same time like this, 
but I wanted to see what it could handle.  Everything ran great with 
only one of us copying big sets of files.  (But... but... this is a 
dual-1.4GHz Tualatin serverworks machine... I want to see that HIGH 
PERFORMANCE!!) ;)

I finally got the other RAID1 pair of 40GB IDE drives ready (on a 
66MHz-capable PCI IDE controller), and the mount points and symlinks 
taken care of, so we could put the mailboxes on separate spindles. 
While I was at it, I set up /etc/fstab to mount that array partition as 
data=journal.  The mailboxes being on separate drives made all the 
difference in the world.  Now BOTH of us can "samba" to our hearts' 
content, copying huge sets of files to the server, and the server keeps 
right on serving up the mailboxes to the mail/web frontends.  Running 
like a top.  Still, we never both do big file copy operations to the 
server at the same time.  But's it's nice to know the system can handle 
it now. ;)  If all goes well and it does not represent a gross misuse of 
available funds, I hope to get some high-RPM U160 drives spinning out 
mailboxes someday, and save the IDE drives for less performance-critical 
machines.

Thanks again for your help Jeremy.  We are running great now.  Still 
want to look at external journals, but we can wait a bit on that.

vinnie

_______________________________________________

Ext3-users@redhat.com
https://listman.redhat.com/mailman/listinfo/ext3-users