Shared VM disk/image on gluster for redundancy?

jdarcy at redhat.com (Jeff Darcy) · Wed, 30 Jun 2010 07:25:45 -0400

On 06/29/2010 11:31 PM, Emmanuel Noobadmin wrote:
> With the nufa volumes, a file is only written to one of the volumes
> listed in its definition.
> If the volume is a replicate volume, then the file is replicated on
> each of the volumes listed in its definition.
> 
> e.g in this case
> volume my_nufa
>   type cluster/nufa
>   option local-volume-name rep1
>   subvolumes rep0 rep1 rep2
> end-volume
> 
> A file is only found in one of rep0 rep1 or rep2. If it was on rep2,
> then it would be inaccessible if rep2 fails such as network failure
> cutting rep2 off.

Yes, but rep2 as a whole could only fail if all of its component volumes
- one on an app node and one on a data node - failed simultaneously.
That's about as good protection as you're going to get without
increasing your replication level (therefore decreasing both performance
and effective storage utilization).

> Then when I add a rep3, gluster should automatically start putting new
> files onto it.
> 
> At this point though, it seems that if I use nufa, I would have an
> issue if I add a purely storage only rep3 instead of an app+storage
> node. None of the servers will use it until their local volume reaches
> max capacity right? :D
> 
> So if I preferred to have the load spread out more evenly, I should
> then be using cluster/distribute?

If you want even distribution across different or variable numbers of
app/data nodes, then cluster/distribute would be the way to go.  For
example, you could create a distribute set across the storage nodes and
a nufa set across the app nodes, and then replicate between the two
(each app node preferring the local member of the nufa set).  You'd lose
the ability to suppress app-node-to-app-node communication with
different read-subvolume assignments, though, and in my experience
replicate over distribute doesn't work quite as well as the other way
around.  Another option, since you do have a fast interconnect, would be
to place all of the permanent storage on the data nodes and use storage
on the app nodes only for caching (as we had discussed).  Replicate
pair-wise or diagonally between data nodes, distribute across the
replica sets, and you'd have a pretty good solution to handle future
expansion.