On Mon, Oct 10, 2011 at 8:26 PM, Miles Fidelman <mfidelman at meetinghouse.net> wrote: > Hi, > > I have a small cluster that I use to host a collection of Xen virtual > machines. ?I just expanded from 2 nodes to 4 nodes and am looking for some > advice re. configuring a storage subsystem. > > The current (2-node) configuration is simple: > - 4 disks per node > - md-based raid > - lvm > - DRBD to replicate (some) volumes between the two nodes > - (some) VMs set up for auto-failover on node failure (pacemaker, etc.) > > In moving to 4 nodes, I'd like to add some flexibility to move VMs across > all 4 nodes, but... that requires using something other than DRBD to > replicate volumes. ?I'm thinking of something with the following > characteristics: > > - 4-node storage cluster (4 drives per node, total of 16 drives in the > storage pool) > - 4-node VM cluster > - using the SAME 4 nodes for both > - note: I've got 4 gigE ports to play with on each box (plan on using 2 for > outside access, 2 for storage/heartbeat networking) > > GlusterFS stands out as the package that seems most capable of supporting > this (if we were using KVM, I'd probably look at Sheepdog as well). > > So... a few questions: > > - it looks like running replicated volumes, across 4 nodes, will provide for > redundancy and support migration/failover (am I right in this? or should I > be looking at running RAID on the individual nodes as well?) If you create a volume with replica count = 2, it creates a distributed replicated volume. (Imagine intelligent RAID-10). You may choose to use disk level RAID too as second level of protection. It is a small investment for added reliability. > - what kind of performance hit is involved in replicated volumes? Synchronous replication does take a hit on performance. It treats writes as a transaction across N nodes. Hit depends on application to application. > - is there anything more efficient in disk use (i.e., mirroring 4 copies > eats up lots of disk, is there anything equivalent to RAID 5/6 that is a > little more efficient while maintaining redundancy?) Just create distributed-mirror with replica count = 2. You can also write a script to automatically replace-brick to a spare disk space in case of node failures. (This way, system will re-build itself, if a mirror member does not come back on time). > - am I missing anything (either re. GlusterFS or other alternatives) Ceph (in development), Sheepdog (KVM specific) are two other projects. > Thanks very much for any suggestions and advice. > > Miles Fidelman > > -- > In theory, there is no difference between theory and practice. > In<fnord> ?practice, there is. ? .... Yogi Berra > -- Anand Babu Periasamy Blog [ http://www.unlocksmith.org ] Twitter [ http://twitter.com/abperiasamy ] Imagination is more important than knowledge --Albert Einstein