Forcing AFR self-heal (was Re: Gluster client 32bit)

jdarcy at redhat.com (Jeff Darcy) · Tue, 16 Nov 2010 20:53:13 -0500

On 11/16/2010 07:54 PM, Craig Carl wrote:
> On 11/16/2010 03:07 PM, Stephan von Krawczynski wrote:
>> which files
>> are not in sync in a replication setup? There is no trivial answer to 
>> this
>> question I already brought up in early 2.X development phase...
>> How can you sell someone a storage platform if you're unable to 
>> answer such an
>> essential question? Really, nobody needed auto-healing. All you need 
>> is the
>> answer to this question and then stat exactly this file list at a 
>> time _of
>> your choice_.
>
> On the sync question you brought up that is only an issue in the rare 
> case of split brain (if I understand the scenario you've brought up). 
> Split brain is a difficult problem with no answer right now. Gluster 
> 3.1 added much more aggressive locking to reduce the possibility of 
> split brain. The process you described as "...the deamons are talking 
> with each other about whatever..." will also reduce the likelihood of 
> split brain by eliminating the possibility that client or server vol 
> files are not the same across the entire cluster, the cause of a vast 
> majority of split brain issues with Gluster.
> Auto heal is slow, we have some processes along the lines you are 
> thinking, please let me know if these address some of your ideas 
> around stat -
>
> #cd <gluster mount>
> #find ./ -type f -exec stat /<backend device>?{}? \; this will heal 
> only the files on that device.
>
> If you know when you had a failure you want to recover from this is 
> even faster -
>
> #cd <gluster mount>
> #find ./ -type f -mmin <minutes since failure+ some extra> -exec stat 
> /<backend device>?{}? \; this will heal only the files on that device 
> changed x or more minutes ago.

See also http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2088 
which is an enhancement request addressing exactly this issue.