On 6/29/10, Jeff Darcy <jdarcy at redhat.com> wrote: > Not only would this work, but my impression is that it's a pretty common > use of GlusterFS. The one thing I'd add is that, if you already have or > ever might have more than two application servers, you use cluster/nufa > as well as cluster/replicate (which is what volgen's "RAID 1" actually > does). The basic idea here is to set up two (or more) subvolumes on > each server, like this: > > srv0vol0 srv1vol0 srv2vol0 > srv0vol1 srv1vol1 srv2vol1 > > Then you replicate "diagonally" with read-subvolume pointing to the top row: > > volume rep0 > type cluster/replicate > option read-subvolume srv0vol0 > subvolumes srv0vol0 srv1vol1 > end-volume > > volume rep1 > type cluster/replicate > option read-subvolume srv1vol0 > subvolumes srv1vol0 srv2vol1 > end-volume > > volume rep2 > type cluster/replicate > option read-subvolume srv2vol0 > subvolumes srv2vol0 srv0vol1 > end-volume > > Lastly, you apply NUFA with "local-volume-name" on each node pointing to > the replicated volume with its read-subvolume on the same machine. So, > on node 1: > > volume my_nufa > type cluster/nufa > option local-volume-name rep1 > subvolumes rep0 rep1 rep2 > end-volume > > With this type of configuration, files created on node 1 will be written > to srv1vol0/srv2vol1 and read from srv1vol0. Note that you don't need > separate disks or anything to set up multiple volumes on a node; they > can just be different directories, though if they're directories within > the same local filesystem then "df" on the GlusterFS filesystem can be > misleading. Extending the approach from three servers to any N should > be pretty obvious, and you can do the same thing with cluster/distribute > instead of cluster/nufa (they actually use the same code) if strong > locality is not a requirement. Thank you very very much for the detailed explanation! Finally seeing a definitive answer is really like light at the end of a long and sleepless tunnel :) > You're not really getting rid of heartbeat/failover delay, so much as > relying on functionally equivalent behavior within GlusterFS. Also, > you'll still need some sort of heartbeat to detect that an application > server has died. Putting your images on GlusterFS makes it possible for > guests on multiple machines to access them, but it's still a bad idea > for them to do so simultaneously. Switching image be manually done so it's not likely there will be simultaneous access. Unless I can figure out if KVM has similar live migration functionality as Xen's Remus. So most likely I would run two or more physical machines with VM to failover to each other to catch situations of a single machine failure. Along with that a pair of storage server. In the case of a total failure where both the primary & secondary VM dies physically, roll in a new machine to load up the VM images still safe on the gluster data servers. So in this case would I be correct that my configuration, assuming a basic 2 physical VM host server and 2 storage server would probably look something like volume rep0 type cluster/replicate option read-subvolume vmsrv0vol0 subvolumes vmsrv0vol0 datasrv0vol0 datasrv1vol0 end-volume volume rep1 type cluster/replicate option read-subvolume vmsrv1vol0 subvolumes vmsrv1vol0 datasrv0vol0 datasrv1vol0 end-volume volume my_nufa type cluster/nufa option local-volume-name rep0 subvolumes rep0 rep1 end-volume Or did I lose my way somewhere? :) Does it make any sense to replicate across all 3 or should I simply spec the VM servers with tiny drives and put everything on the gluster storage which I suppose would impact performance severely?