On 06/29/2010 06:23 AM, Emmanuel Noobadmin wrote: > I've been trying to find a solution to achieve the following > objective: minimum delay redundant network storage for virtualized > server and think gluster might be what I need after throwing out > options like Lustre, dmraid on Openfiler etc. > > The configuration in mind is currently this. > > Application Servers > -> Runs a few VM guest OS > -> Runs gluster client/server > -> VM machine images then stored on mirrored gluster volumes. > > There will be two storage servers with physical RAID. > > The concept is that > 1. physical RAID 1 catches single disk failure on the storage server > 2. gluster mirror on the application server catches single machine > failure of the storage servers > > . . . > > Would this work or am I missing something? Not only would this work, but my impression is that it's a pretty common use of GlusterFS. The one thing I'd add is that, if you already have or ever might have more than two application servers, you use cluster/nufa as well as cluster/replicate (which is what volgen's "RAID 1" actually does). The basic idea here is to set up two (or more) subvolumes on each server, like this: srv0vol0 srv1vol0 srv2vol0 srv0vol1 srv1vol1 srv2vol1 Then you replicate "diagonally" with read-subvolume pointing to the top row: volume rep0 type cluster/replicate option read-subvolume srv0vol0 subvolumes srv0vol0 srv1vol1 end-volume volume rep1 type cluster/replicate option read-subvolume srv1vol0 subvolumes srv1vol0 srv2vol1 end-volume volume rep2 type cluster/replicate option read-subvolume srv2vol0 subvolumes srv2vol0 srv0vol1 end-volume Lastly, you apply NUFA with "local-volume-name" on each node pointing to the replicated volume with its read-subvolume on the same machine. So, on node 1: volume my_nufa type cluster/nufa option local-volume-name rep1 subvolumes rep0 rep1 rep2 end-volume With this type of configuration, files created on node 1 will be written to srv1vol0/srv2vol1 and read from srv1vol0. Note that you don't need separate disks or anything to set up multiple volumes on a node; they can just be different directories, though if they're directories within the same local filesystem then "df" on the GlusterFS filesystem can be misleading. Extending the approach from three servers to any N should be pretty obvious, and you can do the same thing with cluster/distribute instead of cluster/nufa (they actually use the same code) if strong locality is not a requirement. > 3. Avoids any problems caused by a heartbeat/failover delay. > 4. If an application server die, the VM images are still on the > gluster volumes, I can simply distribute the downed VMs to the other > running application server by loading the images. You're not really getting rid of heartbeat/failover delay, so much as relying on functionally equivalent behavior within GlusterFS. Also, you'll still need some sort of heartbeat to detect that an application server has died. Putting your images on GlusterFS makes it possible for guests on multiple machines to access them, but it's still a bad idea for them to do so simultaneously.