On 04/26/2011 05:48 PM, Mohit Anchlia wrote: > I am not sure how valid this performance url is > > http://www.gluster.com/community/documentation/index.php/Guide_to_Optimizing_GlusterFS > > Does it make sense to separate out the journal and create mkfs -I 256? > > Also, if I already have a file system on a different partition can I > still use it to store journal from other partition without corrupting > the file system? Journals are small write heavy. You really want a raw device for them. You do not want file system caching underneath them. Raw partition for an external journal is best. Also, understand that ext* suffers badly under intense parallel loads. Keep that in mind as you make your file system choice. > > On Thu, Apr 21, 2011 at 7:23 PM, Joe Landman > <landman at scalableinformatics.com> wrote: >> On 04/21/2011 08:49 PM, Mohit Anchlia wrote: >>> >>> After lot of digging today finaly figured out that it's not really >>> using PERC controller but some Fusion MPT. Then it wasn't clear which >> >> PERC is a rebadged LSI based on the 1068E chip. >> >>> tool it supports. Finally I installed lsiutil and was able to change >>> the cache size. >>> >>> [root at dsdb1 ~]# lspci|grep LSI >>> 02:00.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068E >>> PCI-Express Fusion-MPT SAS (rev 08) >> >> This looks like PERC. These are roughly equivalent to the LSI 3081 series. >> These are not fast units. There is a variant of this that does RAID6, its >> usually available as a software update or plugin module (button?) to this. >> I might be thinking of the 1078 chip though. >> >> Regardless, these are fairly old designs. >> >> >>> [root at dsdb1 ~]# dd if=/dev/zero of=/data/big.file bs=128k count=40k >>> oflag=direct >>> 1024+0 records in >>> 1024+0 records out >>> 134217728 bytes (134 MB) copied, 0.742517 seconds, 181 MB/s >>> >>> I compared this with SW RAID mdadm that I created yesterday on one of >>> the servers and I get around 300MB/s. I will test out first with what >>> we have before destroying and testing with mdadm. >> >> So the software RAID is giving you 300 MB/s and the hardware 'RAID' is >> giving you ~181 MB/s? Seems a pretty simple choice :) >> >> BTW: The 300MB/s could also be a limitation of the PCIe channel interconnect >> (or worse, if they hung the chip off a PCIx bridge). The motherboard >> vendors are generally loathe to put more than a few PCIe lanes for handling >> SATA, Networking, etc. So typically you wind up with very low powered >> 'RAID' and 'SATA/SAS' on the motherboard, connected by PCIe x2 or x4 at >> most. A number of motherboards have NICs that are served by a single PCIe >> x1 link. >> >>> Thanks for your help that led me to this path. Another question I had >>> was when creating mdadm RAID does it make sense to use multipathing? >> >> Well, for a shared backend over a fabric, I'd say possibly. For an internal >> connected set, I'd say no. Given what you are doing with Gluster, I'd say >> that the additional expense/pain of setting up a multipath scenario probably >> isn't worth it. >> >> Gluster lets you get many of these benefits at a higher level in the stack. >> Which to a degree, and in some use cases, obviates the need for >> multipathing at a lower level. I'd still suggest real RAID at the lower >> level (RAID6, and sometimes RAID10 make the most sense) for the backing >> store. >> >> >> -- >> Joseph Landman, Ph.D >> Founder and CEO >> Scalable Informatics, Inc. >> email: landman at scalableinformatics.com >> web : http://scalableinformatics.com >> http://scalableinformatics.com/sicluster >> phone: +1 734 786 8423 x121 >> fax : +1 866 888 3112 >> cell : +1 734 612 4615 >> -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615