Re: External Journal scenario - good idea?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>
> Yep now (I think) I understand.  Since I have one large / filesystem,
> all writes go through the same "funnel".  All writes have to use the
> same journal, going to the same "drive" (array).  Since the same drives
> are involved writing to the shared dirs for SMB clients, as those which
> are involved with reads/writes to NFS mailbox dirs and other stuff, NFS
> requests and MySQL requests have to "get in line" with SMB requests when
> it's busy.
>


> Currently our complete usage of the single RAID5 array is right around
> 100GB.  It is mostly file storage/backups from other hosts on the
> network.  This will no doubt represent the largest file storage
> requirements of all the fileserver functions for this machine.
>
> In light of the smaller amount of space really needed for all of the
> other functions (combined), and the fact that for each 120GB drive we
> pull off the RAID5 array we will lose around 100GB of RAID5 storage
> capacity (though the drives would have to be removed from the array in
> PAIRS for each RAID1 array we were to create in this external 8-bay
> unit), it seems that the best usage of the external RAID enclosure and
> the 120GB drives we have in it, would be to create the other arrays
> elsewhere, and keep the large array for file storage.  If I am to keep a
> RAID5 array going - I'm going to have to think about this some and
> decide if I can settle for something else, like a RAID0+1 array, or
> smaller RAID1 arrays.
>
> As you said, using a pair of 120GB drives for each RAID1 array used for
> other storage purposes (mailboxes, ftp, SQL database) would be a really
> big waste of space.

Yea, that's the biggest issue facing use of these new large capacity drives. 
Figuring out how to keep performance acceptable while maximizing utilized 
space. Of course, at least with IDE drives, the cost of large capacity drives 
is still relatively low (compared to SCSI). I also suspect it will be 
dropping off in the future even more so as Maxtor is readying a line of 320GB 
drives.


>
> Also, I'm not so sure I would be gaining much advantage to make RAID1
> arrays in the same external unit, assuming I still had a RAID5 array in
> the same unit.  That is, if what I am seeing has much or anything to do
> with the parity calculation speed of the RAID controller in this
> external subsystem.  If it is swamped with XOR calculations while
> writing to a 7 drive array, it would probably not be much less swamped
> calculating parity data for a 4-5 drive array, and even a separate RAID1
> array working behind the same RAID controller may suffer write
> performance issues because the data has to be processed by the same RAID
> controller to actually get written to the RAID1 drives.

I would hope that the controller design is such that it can handle enough 
operations per second to satisfy its host interface (the SCSI connection 
exported to the host machine). Some things to consider if I may.

Remember every large write (assuming it spans multiple strips) means that the 
controller must.

1> Break up the data and issue a write to each drive to physically record the 
data. This may be 5-7 write operations (1 for each drive).

2> Calculate the parity information and again generate a write to each drive 
to physically record the parity data. Again, this may be 5-7 write operations 
(1 for each drive).

That's a series of writes, followed by a parity calculation, follwed by 
another series of writes.

For a RAID1 pair, the operations are more simple

1> Generate a write operation to the primary target drive
2> Generatte a write operation to the mirror drive

Across the board, this is less intensive than RAID5 operation. The offsetting 
factor here would be that the controller has to support one RAID5 set while 
it has to support multiple RAID1 sets. I would bet to say that the load on 
the controller itself would stay about the same using 4 RAID1 sets vs 1 huge 
RAID5 set. The number of write operations dispatched to the drives would be 
about the same aggregaed over time and you'd be saving the overhead of 
calculating the parity information.

>
> But I am really not even sure that what we're seeing here is a problem
> with the speed of the RAID controller.  From some other reading I have
> done, it seems that grabbing up RAM to cache writes and combine it all
> into one big write is something that the 2.4 kernel series is rather
> notorious for.  I saw an article/review of external RAID subsystems
> (both SCSI and ATA-to-SCSI type) which said the same thing - that
> Windows 2000 servers were a lot better at asynchronous I/O than kernel
> 2.4-based Linux, and proceeded to describe much of the same malady I
> have been seeing here.  They did say that a lot of work is going into
> newer Linux kernels to make it better at async disk I/O.
>

Yes, the pagecache becomes a vaccumn cleaner during I/O intensive periods. 
This is being looked into in the development series (cachiness tuning). One 
thing maybe to try:

http://people.redhat.com/alikins/system_tuning.html

Specifically the section on tuning the I/O Elevator.

www.missioncriticallinux.com/orph/ServerNotes.pdf 

Also has some interesting notes in it.

One other thing, if your cache controller is set to run in "write back mode", 
try disabling that. Write back caches on RAID controllers will 
aggregate/delay writes as well. Might be worth looking into (I know it tends 
to kill perfromance on my LSI MegaRaid at times).



> >
> >/
> >/var
> >/tmp
> >/usr
> >/usr/local
>
> So on these (above), have them at least on separate partitions.

Yes, put these on the same disk set (as they aren't I/O intensive by far), but 
keep them on separate partitions. You could then allocate the remainder of 
that disk set to /usr/local an put one of the above tasks there, etc.
 
I also believe there's an added benefit to this as well that you're 
overlooking :). If /, /var, /tmp, /usr, etc are all on the same filesystem, 
one ounce of fs corruption hoses your whole machine. With them split up, /var 
or /tmp can get whacked all to hell, but your machine will still boot :). 


>  Possibly the same drive, but at least separate partitions? (which would
>
> give them separate journals).  And on the ones below:
> >and create special mounts for your samba, mysql, webroot (NFS), mail
> > (NFS), stuff.
> >
> >/usr/local/mysql
> >/usr/local/webs
> >/usr/local/filestore
>
> since this is where the majority of the real file activity is going on,
> put each of these on separate drives (or RAID1 arrays), so we not only
> have separate journals, but separate spindles too) ?
>
> Jeremy thank you so much for your reply.  This has really given me a lot
> to chew on.  And looking at my watch I see that it's Friday again..
> meaning I can actually work on this for a few days... <grin>.
>
> TTYL,
> vinnie

Hope things go well, 

Cheers
Jeremy



_______________________________________________

Ext3-users@redhat.com
https://listman.redhat.com/mailman/listinfo/ext3-users

[Index of Archives]         [Linux RAID]     [Kernel Development]     [Red Hat Install]     [Video 4 Linux]     [Postgresql]     [Fedora]     [Gimp]     [Yosemite News]

  Powered by Linux