Self-Heal Daemon not Running

andrew at andrewklau.com (Andrew Lau) · Wed, 25 Sep 2013 10:46:14 +1000

That's where I found the 200+ entries

[ root at hv01 ]gluster volume heal STORAGE info split-brain
Gathering Heal info on volume STORAGE has been successful

Brick hv01:/data1
Number of entries: 271
at                    path on brick

2013-09-25 00:04:29 /6682d31f-39ce-4896-99ef-14e1c9682585/dom_md/ids
2013-09-25 00:04:29
/6682d31f-39ce-4896-99ef-14e1c9682585/images/5599c7c7-0c25-459a-9d7d-80190a7c739b/0593d351-2ab1-49cd-a9b6-c94c897ebcc7
2013-09-24 23:54:29 <gfid:9c83f7e4-6982-4477-816b-172e4e640566>
2013-09-24 23:54:29 <gfid:91e98909-c217-417b-a3c1-4cf0f2356e14>
<snip>

Brick hv02:/data1
Number of entries: 0

When I run the same command on hv02, it will show the reverse (the other
node having 0 entries).

I remember last time having to delete these files individually on another
split-brain case, but I was hoping there was a better solution than going
through 200+ entries.

Cheers.

On Wed, Sep 25, 2013 at 10:39 AM, Mohit Anchlia <mohitanchlia at gmail.com>wrote:

> What's the output of
>
> gluster volume heal $VOLUME info split-brain
>
>
> On Tue, Sep 24, 2013 at 5:33 PM, Andrew Lau <andrew at andrewklau.com> wrote:
>
>> Found the BZ https://bugzilla.redhat.com/show_bug.cgi?id=960190 - so I
>> restarted one of the volumes and it seems to have restarted the all daemons
>> again.
>>
>> Self heal started again, but I seem to have split-brain issues
>> everywhere. There's over 100 different entries on each node, what's the
>> best way to restore this now? Short of having to manually go through and
>> delete 200+ files. It looks like a full split brain as the file sizes on
>> the different nodes are out of balance by about 100GB or so.
>>
>> Any suggestions would be much appreciated!
>>
>> Cheers.
>>
>> On Tue, Sep 24, 2013 at 10:32 PM, Andrew Lau <andrew at andrewklau.com>wrote:
>>
>>> Hi,
>>>
>>> Right now, I have a 2x1 replica. Ever since I had to reinstall one of
>>> the gluster servers, there's been issues with split-brain. The self-heal
>>> daemon doesn't seem to be running on either of the nodes.
>>>
>>> To reinstall the gluster server (the original brick data was intact but
>>> the OS had to be reinstalled)
>>> - Reinstalled gluster
>>> - Copied over the old uuid from backup
>>> - gluster peer probe
>>> - gluster volume sync $othernode all
>>> - mount -t glusterfs localhost:STORAGE /mnt
>>> - find /mnt -noleaf -print0 | xargs --null stat >/dev/null
>>> 2>/var/log/glusterfs/mnt-selfheal.log
>>>
>>> I let it resync and it was working fine, atleast so I thought. I just
>>> came back a few days later to see there's a miss match in the brick
>>> volumes. One is 50GB ahead of the other.
>>>
>>> # gluster volume heal STORAGE info
>>> Status: self-heal-daemon is not running on
>>> 966456a1-b8a6-4ca8-9da7-d0eb96997cbe
>>>
>>> /var/log/gluster/glustershd.log doesn't seem to have any recent logs,
>>> only those from when the two original gluster servers were running.
>>>
>>> # gluster volume status
>>>
>>> Self-heal Daemon on localhost N/A N N/A
>>>
>>> Any suggestions would be much appreciated!
>>>
>>> Cheers
>>> Andrew.
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130925/5d0ecd63/attachment.html>