RAID options for Gluster

B.Candler at pobox.com (Brian Candler) · Thu, 14 Jun 2012 14:54:35 +0100

On Thu, Jun 14, 2012 at 11:06:32AM +0000, Fernando Frediani (Qube) wrote:
>    No RAID (individual hot swappable disks):
> 
>    Each disk is a brick individually (server:/disk1, server:/disk2, etc)
>    so no RAID controller is required. As the data is replicated if one
>    fail the data must exist in another disk on another node.
> 
>    Pros:
> 
>    Cheaper to build as there is no cost for a expensive RAID controller.

Except that software (md) RAID is free and works with a HBA.

>    Improved performance as writes have to be done only on a single disk
>    not in the entire RAID5/6 Array.
> 
>    Make better usage of the Raw space as there is no disk for parity on a
>    RAID 5/6
> 
> 
>    Cons:
> 
>    If a failed disk gets replaced the data need to be replicated over the
>    network (not a big deal if using Infiniband or 1Gbps+ Network)
> 
>    The biggest file size is the size of one disk if using a volume type
>    Distributed.

Additional Cons:

* You will probably need to write your own tools to monitor and notify you
when a disk fails in the array (wherease there are easily-available existing
tools for md RAID, including E-mail notifications and SNMP integration)

* The process of swapping a disk is not a simple hot-swap: you need to
replace the failed drive, mkfs a new filesystem, and re-introduce it into
the gluster volume.  This is something you will need to document procedures
for and test carefully, whereas RAID swaps are relatively no-brainer.

* For a large configuration with hundreds of drives, it can become ungainly
to have a gluster volume with hundreds of bricks.

>    RAID doesn?t scale well beyond ~16 disks

But you can groups your disks into multiple RAID volumes.

>    Attaching a JBOD to a node and creating multiple RAID Arrays(or a
>    single server with more disk slots) instead of adding a new node can
>    save power(no need CPU, Memory, Motherboard), but having multiple
>    bricks on the same node might happen the data is replicated inside the
>    same node making the downtime of a node something critical, or does
>    Gluster is smart to replicate data to a brick in a different node ?

It's not automatic, you configure it explicitly. If your replica count is 2
then you give it pairs of bricks, and data will be replicated onto each
brick in the pair. It's your responsibility to ensure that those two bricks
are on different servers, if high availability is your concern.

Another alternative to consider: RAID10 on each node. Eliminates the
performance penalty of RAID5/6, indeed will give you improved read
performance compared to single disks, but halves your available storage
capacity.

You can of course mix-and-match. e.g. RAID5 for backup volumes; RAID10 for
highly active read/write volumes; some gluster volumes are replicated and
some are not, etc.  This can become a management headache if it gets too
complex though.