Gluster Self Heal

toby.corkindale at strategicdata.com.au (Toby Corkindale) · Tue, 09 Jul 2013 16:50:12 +1000

On 09/07/13 15:38, Bobby Jacob wrote:
> Hi,
>
> I have a 2-node gluster with 3 TB storage.
>
> 1)I believe the ?glusterfsd? is responsible for the self healing between
> the 2 nodes.
>
> 2)Due to some network error, the replication stopped for some reason but
> the application was accessing the data from node1.  When I manually try
> to start ?glusterfsd? service, its not starting.
>
> Please advice on how I can maintain the integrity of the data so that we
> have all the data in both the locations. ??

There were some bugs in the self-heal daemon present in 3.3.0 and 3.3.1. 
Our systems see the SHD crash out with segfaults quite often, and it 
does not recover.

I reported this bug a long time ago, and it was fixed in trunk 
relatively quickly -- however version 3.3.2 has still not been released, 
despite the fix being found six months ago.

I find this quite disappointing.

T