Broken gluster ... Help required

vbellur at redhat.com (Vijay Bellur) · Fri, 05 Jul 2013 12:44:39 +0530

On 07/04/2013 10:25 PM, HL wrote:
> On 04/07/2013 07:39 ??, Vijay Bellur wrote:
>> On 07/04/2013 07:36 PM, HL wrote:
>>> Hello list,
>>>
>>> I have a 2 node replica  clusterfs
>>>
>>> Volume Name: GLVol
>>> Type: Replicate
>>> Volume ID: 2106bc3d-d184-42db-ab6e-e547ff816113
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: node01:/export/fm_glust
>>> Brick2: node02:/export/fe_glust
>>> Options Reconfigured:
>>> auth.allow: 172.16.35.*
>>> geo-replication.indexing: on
>>>
>>>
>>> After a cable faillure I Have an unstable system,
>>>
>>> gluster volume heal GLVol info split-brain
>>>
>>> Gives about 1000 of these entries ..
>>>
>>> 2013-07-04 16:46:10 <gfid:ce073ea9-a95a-4c5e-b2bd-7db1e26cbad7>
>>> 2013-07-04 16:46:10 <gfid:0d7ff6b2-5ed1-4584-b0e3-9f0c723463b8>
>>> 2013-07-04 16:46:10 /vz_data/mySQLDBS/var/lib/mysql/ib_logfile0
>>>
>>> I've found a script in the users list on how to deal with
>>> /vz_data/mySQLDBS/var/lib/mysql/ib_logfile0
>>> that is known path files
>>>
>>> but don't know how to deal with the hex entries ...  they seem to me as
>>> orphans ...
>>
>> The hex entries are "gfid"s used by GlusterFS for identifying files
>> and are not orphan objects. A gfid that cannot be resolved to a path
>> by the server immediately, would appear with the hex representation.
>>
>>>
>>> Is ok To delete them ???
>>
>> Unless the files are in split brain, it is not a good idea to delete
>> them. Letting the self-heal daemon heal these files would be a better
>> approach.
>>
>
> Thank's Vijay
>
> I am not sure I can follow
> as I stated
> the output of
> gluster volume heal GLVol info split-brain

I am sorry, I missed the split-brain part in the original post.

>
> was
>
> 2013-07-04 16:46:10 <gfid:ce073ea9-a95a-4c5e-b2bd-7db1e26cbad7>
> 2013-07-04 16:46:10 <gfid:0d7ff6b2-5ed1-4584-b0e3-9f0c723463b8>
> 2013-07-04 16:46:10 /vz_data/mySQLDBS/var/lib/mysql/ib_logfile0
>
> So it seems to me that they are in split-brain
>
> Other than that ..
> How can I make sure that  the self-heal daemon
> is running and is healing indeed the files
>
> gluster  volume status all
> shows
>
> Status of volume: GLVol
> Gluster process                        Port    Online    Pid
> ------------------------------------------------------------------------------
>
> Brick node01:/export/fm_glust                24009    Y    2614
> Brick node02:/export/fe_glust                24009    Y    2746
> NFS Server on localhost                    38467    Y    2752
> Self-heal Daemon on localhost                N/A    Y    2758
> NFS Server on node01                    38467    Y    2626
> Self-heal Daemon on node01                N/A    Y    2632

Volume status is the right way to determine if self-heal daemon is running.

>
>
> So
> Delete them or not???

To resolve split-brain, you will need to delete/mv one copy of the 
replica that you don't consider correct.

>
> If self healing is doing any progress
> My bricks are about 500G in size containg about 400K file entries
> the two bricks are connected with a dedicated 1G nic
> how long will it take to heal ???

Resolving split-brain needs manual intervention. You can use "volume 
heal <volname> info" to see the activity of self-heal daemon.

Regards,
Vijay