On 06/14/2012 07:06 AM, Fernando Frediani (Qube) wrote: > I think this discussion probably came up here already but I couldn?t > find much on the archives. Would you able to comment or correct whatever > might look wrong. > > What options people think is more adequate to use with Gluster in terms > of RAID underneath and a good balance between cost, usable space and > performance. I have thought about two main options with its Pros and Cons > > *No RAID (individual hot swappable disks):* > > Each disk is a brick individually (server:/disk1, server:/disk2, etc) so > no RAID controller is required. As the data is replicated if one fail > the data must exist in another disk on another node. For this to work well, you need the ability to mark a disk as failed and as ready for removal, or to migrate all data on a disk over to a new disk. Gluster only has the last capability, and doesn't have the rest. You still need additional support in the OS and tool sets. The tools we've developed for DeltaV and siFlash help in this regard, though I wouldn't suggest using Gluster in this mode. > > _Pros_: > > Cheaper to build as there is no cost for a expensive RAID controller. If a $500USD RAID adapter saves you $1000USD of time/expense over its lifetime due to failed disk alerts, hot swap autoconfiguration, etc. is it "really" expensive? Of course, if you are at a university where you have infinite amounts of cheap labor, sure, its expensive. Cheaper to manage by throwing grad/undergrad students at it than it is to manage with an HBA. That is, the word "expensive" has different meanings in different contexts ... and in storage, the $500USD adapter may easily help reduce costs elsewhere in the system (usually in the disk lifecycle management, as RAID's major purpose in life is to give you the administrator a fighting chance to replace a failed device before you lose your data). > > Improved performance as writes have to be done only on a single disk not > in the entire RAID5/6 Array. Good for tiny writes. Bad for larger writes (>64kB) > > Make better usage of the Raw space as there is no disk for parity on a > RAID 5/6 > > _Cons_: > > If a failed disk gets replaced the data need to be replicated over the > network (not a big deal if using Infiniband or 1Gbps+ Network) For a 100 MB/s pipe (streaming disk read, which you don't normally get when you copy random files to/from disk), 1 GB = 10 seconds. 1 TB = 10,000 seconds. This is the best case scenario. In reality, you will get some fractional portion of that disk read/write speed. So expect 10,000 seconds as the most optimistic (and unrealistic) estimate ... a lower bound on time. > > The biggest file size is the size of one disk if using a volume type > Distributed. For some users this is not a problem, though several years ago, we had users wanting to read write *single* TB sized files. > > In this case does anyone know if when replacing a failed disk does it > need to be manually formatted and mounted ? In this model, yes. This is why the RAID adapter saves time unless you have written/purchased "expensive" tools to do similar things. > > *RAID Controller:* > > Using a RAID controller with battery backup can improve the performance > specially caching the writes on the controller?s memory but at the end > one single array means the equivalent performance of one disk for each > brick. Also RAID requires have either 1 or 2 disks for parity. If using For large reads/writes, you typically get N* (N disks reduced by number of parity disks and hot spares) disk performance. For small reads/writes you get 1 disk (or less) performance. Basically optimal read/write will be in multiples of the stripe width. Optimizing stripe width and chunk sizes for various applications is something of a black art, in that overoptimization for one size/app will negatively impact another. > very cheap disks probably better use RAID 6, if using better quality > ones should be fine RAID 5 as, again, the data the data is replicated to > another RAID 5 on another node. If you have more than 6TB of data, use RAID6 or RAID10. RAID5 shouldn't be used for TB class storage for units with UCE rates more than 10^-17 (you would hit a UCE on rebuild for a failed drive, which would take out all your data ... not nice). > > _Pros_: > > Can create larger array as a single brick in order to fit bigger files > for when using Distributed volume type. > > Disk rebuild should be quicker (and more automated?) More generally, management is nearly automatic, modulo physically replacing a drive. > > _Cons_: > > Extra cost of the RAID controller. Its a cost-benefit analysis, and for lower end storage units, the CBE almost always is in favor of a reasonable RAID design. > > Performance of the array is equivalent a single disk + RAID controller > caching features. No ... see above. > > RAID doesn?t scale well beyond ~16 disks 16 disks is the absolute maximum we would ever tie to a single RAID (or HBA). Most RAID processor chips can't handle the calculations for 16 disks (compare the performance of RAID6 at 16 drives to that at 12 drives for similar sized chunks, and "optimal" IO ... in most cases, the performance delta isn't 16/12, 14/10, 13/9 or similar. Its typically a bit lower. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615