A question about healing

scottix at gmail.com (Scottix) · Mon, 15 Jul 2013 09:43:14 -0700

I did a lot of testing on distributed-replication and my end result was
that 3.3.1 is not adequate in the automatic self-heal. I ran the qa version
of 3.3.2 and I was not able to find a fault. Also if you can get your
replication up to 3 then you can set a quorum of 2 and that would make very
rare chances that you would ever get a split brain. Also I don't recommend
this to everyone but in my scenario I was appending csv files. I found
using the diff option for the self-heal created less data loss, again if
you don't understand that option then don't change it.

Also still waiting for php to fix my bug
https://bugs.php.net/bug.php?id=60110

On Mon, Jul 15, 2013 at 1:52 AM, Toby Corkindale <
toby.corkindale at strategicdata.com.au> wrote:

> On 12/07/13 06:44, Michael Peek wrote:
>
>> Hi gurus,
>>
>> So I have a cluster that I've set up and I'm banging on.  It's comprised
>> of four machines with two drives in each machine.  (By the way, the
>> 3.2.5 version that comes with stock Ubuntu 12.04 seems to have a lot of
>> bugs/instability.  I was screwing it up daily just by putting it through
>> some heavy-use tests.  Then I downloaded 3.3.1 from the PPA, and so far
>> things seem a LOT more stable.  I haven't managed to break anything yet,
>> although the night is still young.)
>>
>> I'm dumping data to it like mad, and I decide to simulate a filesystem
>> error my remounting half of the cluster's drives in read-only mode with
>> "mount -o remount,ro".
>>
>> The cluster seems to slow just slightly, but it kept on ticking.  Great.
>>
>
>
> While you're performing your testing, can I suggest you include testing
> following behaviour too, to ensure the performance meets your needs.
>
> Fill the volumes up with data, to a point similar to what you expect to
> reach in production use. Not just in terms of disk space, but number of
> files and directories as well. You might need to write a small script that
> can build a simulated directory tree, populated with a range of file sizes.
>
> Take one of the nodes offline (or read-only), and then touch and modify a
> large number of files randomly around the volume. Imagine that a node was
> offline for 24 hours, and that you're simulating the quantity of write
> patterns that would occur in total over that time.
>
> Now bring the "failed" node back online and start the healing process.
> Meanwhile, continue to simulate client access patterns on the files you
> were modifying earlier. Ensure that performance is still sufficient for
> your needs.
>
>
> It's a more complicated test to run, but it's important to measure how
> gluster performs with your workload in non-ideal circumstances that you
> will eventually hit.
>
> ______________________________**_________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
>

-- 
Follow Me: @Scottix <http://www.twitter.com/scottix>
http://about.me/scottix
Scottix at Gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130715/0751b010/attachment.html>