On 06/30/2010 10:22 AM, Craig Box wrote: > OK, so this brings me to Plan B. (Feel free to suggest a plan C if you can.) > > I want to have six nodes, three in each availability zone, replicate a > Mercurial repository. Here's some art: > > [gluster c/s] [gluster c/s] | [gluster c/s] [gluster c/s] > | > [gluster s] | [gluster s] > [OCFS 2] | [OCFS 2] > [ DRBD ] ----------- [ DRBD ] > > DRBD doing the cross-AZ replication, and a three node GlusterFS > cluster inside each AZ. That way, any one machine going down should > still mean all the rest of the nodes can access the files. > > Sound believable? OCFS2 is a shared-disk filesystem, and in EC2 neither ephemeral storage nor EBS can be mounted on more than one instance simultaneously. Therefore, you'd need something to provide a shared-disk abstraction within an AZ. DRBD mode can do this, and I think it's even reentrant so that the devices created this way can themselves be used as components for the inter-AZ-replication devices, but active/active mode isn't recommended and I don't think you can connect more than two nodes this way. What's really needed, and I'm slightly surprised doesn't already exist, is a DRBD proxy that can be connected as a destination by several local DRBD sources, and then preserve request order even across devices as it becomes a DRBD source and ships those requests to another proxy in another AZ. Linbit's proxy doesn't seem to be designed for that particular purpose. The considerations for dm-replicator are essentially the same BTW. An async/long-distance replication translator has certainly been a frequent topic of discussion between me, the Gluster folks, and others. I have plans to shoot for full N-way active/active replication, but with that ambition comes complexity and we'll probably see simpler forms (e.g. two-way active/passive) much earlier.