RAID options for Gluster

landman at scalableinformatics.com (Joe Landman) · Thu, 14 Jun 2012 10:09:18 -0400

On 06/14/2012 07:06 AM, Fernando Frediani (Qube) wrote:
> I think this discussion probably came up here already but I couldn?t
> find much on the archives. Would you able to comment or correct whatever
> might look wrong.
>
> What options people think is more adequate to use with Gluster in terms
> of RAID underneath and a good balance between cost, usable space and
> performance. I have thought about two main options with its Pros and Cons
>
> *No RAID (individual hot swappable disks):*
>
> Each disk is a brick individually (server:/disk1, server:/disk2, etc) so
> no RAID controller is required. As the data is replicated if one fail
> the data must exist in another disk on another node.

For this to work well, you need the ability to mark a disk as failed and 
as ready for removal, or to migrate all data on a disk over to a new 
disk.  Gluster only has the last capability, and doesn't have the rest. 
  You still need additional support in the OS and tool sets.

The tools we've developed for DeltaV and siFlash help in this regard, 
though I wouldn't suggest using Gluster in this mode.

>
> _Pros_:
>
> Cheaper to build as there is no cost for a expensive RAID controller.

If a $500USD RAID adapter saves you $1000USD of time/expense over its 
lifetime due to failed disk alerts, hot swap autoconfiguration, etc. is 
it "really" expensive?  Of course, if you are at a university where you 
have infinite amounts of cheap labor, sure, its expensive.  Cheaper to 
manage by throwing grad/undergrad students at it than it is to manage 
with an HBA.

That is, the word "expensive" has different meanings in different 
contexts ... and in storage, the $500USD adapter may easily help reduce 
costs elsewhere in the system (usually in the disk lifecycle management, 
as RAID's major purpose in life is to give you the administrator a 
fighting chance to replace a failed device before you lose your data).

>
> Improved performance as writes have to be done only on a single disk not
> in the entire RAID5/6 Array.

Good for tiny writes.  Bad for larger writes (>64kB)

>
> Make better usage of the Raw space as there is no disk for parity on a
> RAID 5/6
>
> _Cons_:
>
> If a failed disk gets replaced the data need to be replicated over the
> network (not a big deal if using Infiniband or 1Gbps+ Network)

For a 100 MB/s pipe (streaming disk read, which you don't normally get 
when you copy random files to/from disk), 1 GB = 10 seconds.  1 TB = 
10,000 seconds.  This is the best case scenario.  In reality, you will 
get some fractional portion of that disk read/write speed.  So expect 
10,000 seconds as the most optimistic (and unrealistic) estimate ... a 
lower bound on time.

>
> The biggest file size is the size of one disk if using a volume type
> Distributed.

For some users this is not a problem, though several years ago, we had 
users wanting to read write *single* TB sized files.

>
> In this case does anyone know if when replacing a failed disk does it
> need to be manually formatted and mounted ?

In this model, yes.  This is why the RAID adapter saves time unless you 
have written/purchased "expensive" tools to do similar things.

>
> *RAID Controller:*
>
> Using a RAID controller with battery backup can improve the performance
> specially caching the writes on the controller?s memory but at the end
> one single array means the equivalent performance of one disk for each
> brick. Also RAID requires have either 1 or 2 disks for parity. If using

For large reads/writes, you typically get N* (N disks reduced by number 
of parity disks and hot spares) disk performance.  For small 
reads/writes you get 1 disk (or less) performance.  Basically optimal 
read/write will be in multiples of the stripe width.  Optimizing stripe 
width and chunk sizes for various applications is something of a black 
art, in that overoptimization for one size/app will negatively impact 
another.

> very cheap disks probably better use RAID 6, if using better quality
> ones should be fine RAID 5 as, again, the data the data is replicated to
> another RAID 5 on another node.

If you have more than 6TB of data, use RAID6 or RAID10.  RAID5 shouldn't 
be used for TB class storage for units with UCE rates more than 10^-17 
(you would hit a UCE on rebuild for a failed drive, which would take out 
all your data ... not nice).

>
> _Pros_:
>
> Can create larger array as a single brick in order to fit bigger files
> for when using Distributed volume type.
>
> Disk rebuild should be quicker (and more automated?)

More generally, management is nearly automatic, modulo physically 
replacing a drive.

>
> _Cons_:
>
> Extra cost of the RAID controller.

Its a cost-benefit analysis, and for lower end storage units, the CBE 
almost always is in favor of a reasonable RAID design.

>
> Performance of the array is equivalent a single disk + RAID controller
> caching features.

No ... see above.

>
> RAID doesn?t scale well beyond ~16 disks

16 disks is the absolute maximum we would ever tie to a single RAID (or 
HBA).  Most RAID processor chips can't handle the calculations for 16 
disks (compare the performance of RAID6 at 16 drives to that at 12 
drives for similar sized chunks, and "optimal" IO ... in most cases, the 
performance delta isn't 16/12, 14/10, 13/9 or similar.  Its typically a 
bit lower.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615