Best practices after a peer failure?

milanraf at gmail.com (R.C.) · Wed, 16 Mar 2011 12:00:36 +0100

> Self heal happens whenever a lookup happens on an in-consistent file. The 
> commands ls -laR, find do lookup on all the files recursively under the 
> directory we specify.
>

Let's take an example:
- replica 2 cluster (2 peers) with 500K files
- during weekend the peer we call '1' disconnects for a short time (say 30 
minutes) and when connection comes up again, about 10K files where modified 
or created.
- on monday the Administrator hasn't any knowledge of the network glitch 
(let's suppose he didn't implement any sort of network logging system)
- after 3 days, 1K of the 10K files modified during network glitch are still 
unaccessed; in the afternoon the peer 2 hard crashes due to a total hardware 
failure (MB replace needed)
Now we have 1K files unaccessible or obsolete!

I think that when a peer comes back, self-healing should start 
automatically.
Of course we could write a shell script that tests network and issues an 
'ls -laR' command when needed, but this is a sort of dirty solution.

Raf

> Pranith.
>
> ----- Original Message -----
> From: "Mohit Anchlia" <mohitanchlia at gmail.com>
> To: "Pranith Kumar. Karampuri" <pranithk at gluster.com>, 
> gluster-users at gluster.org
> Sent: Wednesday, March 16, 2011 3:19:13 AM
> Subject: Re: Best practices after a peer failure?
>
> I thought self healing is possible only after we run "ls -alR or find
> .." . It looks self healing is supposed to work when a dead node is
> brought up, is that true?
>
> On Tue, Mar 15, 2011 at 6:07 AM, Pranith Kumar. Karampuri
> <pranithk at gluster.com> wrote:
>> hi R.C.,
>> Could you please give the exact steps when you log the bug. Please also 
>> give the output of gluster peer status on both the machines after 
>> restart. zip the files under /usr/local/var/log/glusterfs/ and 
>> /etc/glusterd on both the machines when this issue happens. This should 
>> help us debug the issue.
>>
>> Thanks
>> Pranith.
>>
>> ----- Original Message -----
>> From: "R.C." <milanraf at gmail.com>
>> To: gluster-users at gluster.org
>> Sent: Tuesday, March 15, 2011 4:14:24 PM
>> Subject: Re: Best practices after a peer failure?
>>
>> I've figured out the problem.
>>
>> If you mount the glusterfs with native client on a peer, if another peer
>> crashes then doesn't self-heal after reboot.
>>
>> Should I put this issue in the bug tracker?
>>
>> Bye
>>
>> Raf
>>
>>
>> ----- Original Message -----
>> From: "R.C." <milanraf at gmail.com>
>> To: <gluster-users at gluster.org>
>> Sent: Monday, March 14, 2011 11:41 PM
>> Subject: Best practices after a peer failure?
>>
>>
>>> Hello to the list.
>>>
>>> I'm practicing GlusterFS in various topologies by means of multiple
>>> Virtualbox VMs.
>>>
>>> As the standard system administrator, I'm mainly interested in disaster
>>> recovery scenarios. The first being a replica 2 configuration, with one
>>> peer crashing (actually stopping VM abruptly) during data writing to the
>>> volume.
>>> After rebooting the stopped VM and relaunching the gluster deamon 
>>> (service
>>> glusterd start), the cluster doesn't start healing by itself.
>>> I've also tried the suggested commands:
>>> find <gluster-mount> -print0 | xargs --null stat >/dev/null
>>> and
>>> find <gluster-mount> -type f -exec dd if='{}' of=/dev/null bs=1M \; >
>>> /dev/null 2>&1
>>> without success.
>>> A rebalance command recreates replicas but, when accessing cluster, the
>>> always-alive client is the only one committing data to disk.
>>>
>>> Where am I misoperating?
>>>
>>> Thank you for your support.
>>>
>>> Raf
>>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>