Self-Heal Daemon not Running

ravishankar at redhat.com (Ravishankar N) · Wed, 25 Sep 2013 09:58:45 +0530



On 09/25/2013 06:16 AM, Andrew Lau wrote:
> That's where I found the 200+ entries
>
> [ root at hv01 ]gluster volume heal STORAGE info split-brain
> Gathering Heal info on volume STORAGE has been successful
>
> Brick hv01:/data1
> Number of entries: 271
> at                    path on brick
>
> 2013-09-25 00:04:29 /6682d31f-39ce-4896-99ef-14e1c9682585/dom_md/ids
> 2013-09-25 00:04:29 
> /6682d31f-39ce-4896-99ef-14e1c9682585/images/5599c7c7-0c25-459a-9d7d-80190a7c739b/0593d351-2ab1-49cd-a9b6-c94c897ebcc7
> 2013-09-24 23:54:29 <gfid:9c83f7e4-6982-4477-816b-172e4e640566>
> 2013-09-24 23:54:29 <gfid:91e98909-c217-417b-a3c1-4cf0f2356e14>
> <snip>
>
> Brick hv02:/data1
> Number of entries: 0
>
> When I run the same command on hv02, it will show the reverse (the 
> other node having 0 entries).
>
> I remember last time having to delete these files individually on 
> another split-brain case, but I was hoping there was a better solution 
> than going through 200+ entries.
>
While I haven't tried it out myself, Jeff Darcy has written a script 
(https://github.com/jdarcy/glusterfs/tree/heal-script/extras/heal_script) which 
helps in automating the process. He has detailed it's usage in his blog 
post http://hekafs.org/index.php/2012/06/healing-split-brain/

Hope this helps.
-Ravi
> Cheers.
>
>
> On Wed, Sep 25, 2013 at 10:39 AM, Mohit Anchlia 
> <mohitanchlia at gmail.com <mailto:mohitanchlia at gmail.com>> wrote:
>
>     What's the output of
>     |gluster volume heal $VOLUME info ||split||-brain|
>
>
>     On Tue, Sep 24, 2013 at 5:33 PM, Andrew Lau <andrew at andrewklau.com
>     <mailto:andrew at andrewklau.com>> wrote:
>
>         Found the BZ
>         https://bugzilla.redhat.com/show_bug.cgi?id=960190 - so I
>         restarted one of the volumes and it seems to have restarted
>         the all daemons again.
>
>         Self heal started again, but I seem to have split-brain issues
>         everywhere. There's over 100 different entries on each node,
>         what's the best way to restore this now? Short of having to
>         manually go through and delete 200+ files. It looks like a
>         full split brain as the file sizes on the different nodes are
>         out of balance by about 100GB or so.
>
>         Any suggestions would be much appreciated!
>
>         Cheers.
>
>         On Tue, Sep 24, 2013 at 10:32 PM, Andrew Lau
>         <andrew at andrewklau.com <mailto:andrew at andrewklau.com>> wrote:
>
>             Hi,
>
>             Right now, I have a 2x1 replica. Ever since I had to
>             reinstall one of the gluster servers, there's been issues
>             with split-brain. The self-heal daemon doesn't seem to be
>             running on either of the nodes.
>
>             To reinstall the gluster server (the original brick data
>             was intact but the OS had to be reinstalled)
>             - Reinstalled gluster
>             - Copied over the old uuid from backup
>             - gluster peer probe
>             - gluster volume sync $othernode all
>             - mount -t glusterfs localhost:STORAGE /mnt
>             - find /mnt -noleaf -print0 | xargs --null stat >/dev/null
>             2>/var/log/glusterfs/mnt-selfheal.log
>
>             I let it resync and it was working fine, atleast so I
>             thought. I just came back a few days later to see there's
>             a miss match in the brick volumes. One is 50GB ahead of
>             the other.
>
>             # gluster volume heal STORAGE info
>             Status: self-heal-daemon is not running on
>             966456a1-b8a6-4ca8-9da7-d0eb96997cbe
>
>             /var/log/gluster/glustershd.log doesn't seem to have any
>             recent logs, only those from when the two original gluster
>             servers were running.
>
>             # gluster volume status
>
>             Self-heal Daemon on localhostN/ANN/A
>
>             Any suggestions would be much appreciated!
>
>             Cheers
>             Andrew.
>
>
>
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>         http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130925/29ad61b9/attachment.html>