I did a lot of testing on distributed-replication and my end result was that 3.3.1 is not adequate in the automatic self-heal. I ran the qa version of 3.3.2 and I was not able to find a fault. Also if you can get your replication up to 3 then you can set a quorum of 2 and that would make very rare chances that you would ever get a split brain. Also I don't recommend this to everyone but in my scenario I was appending csv files. I found using the diff option for the self-heal created less data loss, again if you don't understand that option then don't change it. Also still waiting for php to fix my bug https://bugs.php.net/bug.php?id=60110 On Mon, Jul 15, 2013 at 1:52 AM, Toby Corkindale < toby.corkindale at strategicdata.com.au> wrote: > On 12/07/13 06:44, Michael Peek wrote: > >> Hi gurus, >> >> So I have a cluster that I've set up and I'm banging on. It's comprised >> of four machines with two drives in each machine. (By the way, the >> 3.2.5 version that comes with stock Ubuntu 12.04 seems to have a lot of >> bugs/instability. I was screwing it up daily just by putting it through >> some heavy-use tests. Then I downloaded 3.3.1 from the PPA, and so far >> things seem a LOT more stable. I haven't managed to break anything yet, >> although the night is still young.) >> >> I'm dumping data to it like mad, and I decide to simulate a filesystem >> error my remounting half of the cluster's drives in read-only mode with >> "mount -o remount,ro". >> >> The cluster seems to slow just slightly, but it kept on ticking. Great. >> > > > While you're performing your testing, can I suggest you include testing > following behaviour too, to ensure the performance meets your needs. > > Fill the volumes up with data, to a point similar to what you expect to > reach in production use. Not just in terms of disk space, but number of > files and directories as well. You might need to write a small script that > can build a simulated directory tree, populated with a range of file sizes. > > Take one of the nodes offline (or read-only), and then touch and modify a > large number of files randomly around the volume. Imagine that a node was > offline for 24 hours, and that you're simulating the quantity of write > patterns that would occur in total over that time. > > Now bring the "failed" node back online and start the healing process. > Meanwhile, continue to simulate client access patterns on the files you > were modifying earlier. Ensure that performance is still sufficient for > your needs. > > > It's a more complicated test to run, but it's important to measure how > gluster performs with your workload in non-ideal circumstances that you > will eventually hit. > > ______________________________**_________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users> > -- Follow Me: @Scottix <http://www.twitter.com/scottix> http://about.me/scottix Scottix at Gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130715/0751b010/attachment.html>