Nathan, This might be a side effect of open files not being replicated. If the VM has any open files that it has written to, none of the changes will be propagated to the other node until the file is closed. If a gluster node goes down, and that's the node with the modified open file, then you are out of luck. If the VM reopens the file while the correct node is down, it will get the out of date file on the replicated node. From there, the VM might go bonkers, or maybe it writes to that file also, then you end up with a split brain where you have two different copies on two different nodes. 3.0 is supposed to address the open file replicate, though I haven't had a chance yet to test it to see if it fixes all the problems we were having with 2.08. You might want to check out 3.0pre1 and see if that makes any difference. --brian On Nov 12, 2009, at 11:35 AM, Nathan Stratton wrote: > > Should this work? I understand there are issues. I know that self heal takes forever, I know that I have to use disable-direct-io, I know that I must use file rather then tap:aio. However, should basic redundancy work? When a node crashes, I seam to lose all xen VMs that are using that node even tho I have another node in distribute that is working. > > My config is simple, 4 nodes using distribute and replicate. > > http://share.robotics.net/glusterfs.vol > http://share.robotics.net/glusterfsd.vol > > Any help is appreciated, we are getting ready to buy netapp, something I rather not do. > >> <> > Nathan Stratton CTO, BlinkMind, Inc. > nathan at robotics.net nathan at blinkmind.com > http://www.robotics.net http://www.blinkmind.com > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >