On Thu, Jun 14, 2012 at 11:06:32AM +0000, Fernando Frediani (Qube) wrote: > No RAID (individual hot swappable disks): > > Each disk is a brick individually (server:/disk1, server:/disk2, etc) > so no RAID controller is required. As the data is replicated if one > fail the data must exist in another disk on another node. > > Pros: > > Cheaper to build as there is no cost for a expensive RAID controller. Except that software (md) RAID is free and works with a HBA. > Improved performance as writes have to be done only on a single disk > not in the entire RAID5/6 Array. > > Make better usage of the Raw space as there is no disk for parity on a > RAID 5/6 > > > Cons: > > If a failed disk gets replaced the data need to be replicated over the > network (not a big deal if using Infiniband or 1Gbps+ Network) > > The biggest file size is the size of one disk if using a volume type > Distributed. Additional Cons: * You will probably need to write your own tools to monitor and notify you when a disk fails in the array (wherease there are easily-available existing tools for md RAID, including E-mail notifications and SNMP integration) * The process of swapping a disk is not a simple hot-swap: you need to replace the failed drive, mkfs a new filesystem, and re-introduce it into the gluster volume. This is something you will need to document procedures for and test carefully, whereas RAID swaps are relatively no-brainer. * For a large configuration with hundreds of drives, it can become ungainly to have a gluster volume with hundreds of bricks. > RAID doesn?t scale well beyond ~16 disks But you can groups your disks into multiple RAID volumes. > Attaching a JBOD to a node and creating multiple RAID Arrays(or a > single server with more disk slots) instead of adding a new node can > save power(no need CPU, Memory, Motherboard), but having multiple > bricks on the same node might happen the data is replicated inside the > same node making the downtime of a node something critical, or does > Gluster is smart to replicate data to a brick in a different node ? It's not automatic, you configure it explicitly. If your replica count is 2 then you give it pairs of bricks, and data will be replicated onto each brick in the pair. It's your responsibility to ensure that those two bricks are on different servers, if high availability is your concern. Another alternative to consider: RAID10 on each node. Eliminates the performance penalty of RAID5/6, indeed will give you improved read performance compared to single disks, but halves your available storage capacity. You can of course mix-and-match. e.g. RAID5 for backup volumes; RAID10 for highly active read/write volumes; some gluster volumes are replicated and some are not, etc. This can become a management headache if it gets too complex though.