Re: Self Heal Issue GlusterFS 3.3.1

Bobby Jacob <bobby.jacob@xxxxxxxxxxx> · Tue, 10 Dec 2013 07:42:57 +0000

Hi,

Thanks Joe, the split brain files have been removed as you recommended. How can we deal with this situation as there is no document which solves such issues. ?

[root@KWTOCUATGS001 83]# gluster volume heal glustervol info
Gathering Heal info on volume glustervol has been successful

Brick KWTOCUATGS001:/mnt/cloudbrick
Number of entries: 14
/Tommy Kolega
<gfid:10429dd5-180c-432e-aa4a-8b1624b86f4b>
<gfid:7883309e-8764-4cf6-82a6-d8d81cb60dd7>
<gfid:3e3d77d6-2818-4766-ae3b-4f582118321b>
<gfid:8bd03482-025c-4c09-8704-60be9ddfdfd8>
<gfid:2685e11a-4eb9-4a92-883e-faa50edfa172>
<gfid:24d83cbd-e621-4330-b0c1-ae1f0fd2580d>
<gfid:197e50fa-bfc0-4651-acaa-1f3d2d73936f>
<gfid:3e094ee9-c9cf-4010-82f4-6d18c1ab9ca0>
<gfid:77783245-4e03-4baf-8cb4-928a57b266cb>
<gfid:70340eaa-7967-41d0-855f-36add745f16f>
<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>
<gfid:b1651457-175a-43ec-b476-d91ae8b52b0b>
/Tommy Kolega/lucene_index

Brick KWTOCUATGS002:/mnt/cloudbrick
Number of entries: 15
<gfid:7883309e-8764-4cf6-82a6-d8d81cb60dd7>
<gfid:0454d0d2-d432-4ac8-8476-02a8522e4a6a>
<gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6>
<gfid:00389876-700f-4351-b00e-1c57496eed89>
<gfid:0cd48d89-1dd2-47f6-9311-58224b19446e>
<gfid:081c6657-301a-42a4-9f95-6eeba6c67413>
<gfid:565f1358-449c-45e2-8535-93b5632c0d1e>
<gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e>
<gfid:25fd406f-63e0-4037-bb01-da282cbe4d76>
<gfid:a109c429-5885-499e-8711-09fdccd396f2>
<gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6>
/Tommy Kolega
/Tommy Kolega/lucene_index
<gfid:c49e9d76-e5d4-47dc-9cf1-3f858f6d07ea>
<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>

Thanks & Regards,
Bobby Jacob

-----Original Message-----
From: Joe Julian [mailto:joe@xxxxxxxxxxxxxxxx] 
Sent: Tuesday, December 10, 2013 7:59 AM
To: Bobby Jacob
Cc: gluster-users@xxxxxxxxxxx
Subject: Re:  Self Heal Issue GlusterFS 3.3.1

On Tue, 2013-12-03 at 05:47 +0000, Bobby Jacob wrote:
> Hi,
> 
>  
> 
> I’m running glusterFS 3.3.1 on Centos 6.4. 
> 
> Ø Gluster volume status
> 
>  
> 
> Status of volume: glustervol
> 
> Gluster process                                         Port    Online
> Pid
> 
> ----------------------------------------------------------------------
> --------
> 
> Brick KWTOCUATGS001:/mnt/cloudbrick                     24009   Y
> 20031
> 
> Brick KWTOCUATGS002:/mnt/cloudbrick                     24009   Y
> 1260
> 
> NFS Server on localhost
>                       38467   Y       43320
> 
> Self-heal Daemon on localhost                                    N/A
> Y       43326
> 
> NFS Server on KWTOCUATGS002                             38467   Y
> 5842
> 
> Self-heal Daemon on KWTOCUATGS002                       N/A     Y
> 5848
> 
>  
> 
> The self heal stops working and application write only to 1 brick and 
> it doesn’t replicate. When I check /var/log/glusterfs/glustershd.log I 
> see the following.:
> 
>  
> 
> [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive]
> 0-socket: failed to set keep idle on socket 8
> 
> [2013-12-03 05:42:32.033646] W
> [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd:
> Failed to set keep-alive: Operation not supported
> 
> [2013-12-03 05:42:32.790473] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437), 
> Version (330)
> 
> [2013-12-03 05:42:32.790840] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1:
> Connected to 172.16.95.153:24009, attached to remote volume 
> '/mnt/cloudbrick'.
> 
> [2013-12-03 05:42:32.790884] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1:
> Server and Client lk-version numbers are not same, reopening the fds
> 
> [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify]
> 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back 
> up; going online.
> 
> [2013-12-03 05:42:32.791161] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-1: Server lk version = 1
> 
> [2013-12-03 05:42:32.795103] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.798064] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.799278] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.800636] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.802223] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.803339] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.804308] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.804877] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437), 
> Version (330)
> 
> [2013-12-03 05:42:32.807517] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0:
> Connected to 172.16.107.154:24009, attached to remote volume 
> '/mnt/cloudbrick'.
> 
> [2013-12-03 05:42:32.807562] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0:
> Server and Client lk-version numbers are not same, reopening the fds
> 
> [2013-12-03 05:42:32.810357] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-0: Server lk version = 1
> 
> [2013-12-03 05:42:32.827437] E
> [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done]
> 0-glustervol-replicate-0: Unable to self-heal contents of 
> '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain).
> Please delete the file from all but the preferred subvolume.

That file is at
$brick/.glusterfs/12/62/1262d40d-46a3-4e57-b07b-0fcc972c8403

Try picking one to remove like it says.
> 
> [2013-12-03 05:42:39.205157] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of 
> '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain).
> Please fix the file on all backend volumes
> 
> [2013-12-03 05:42:39.215793] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership of 
> '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain).
> Please fix the file on all backend volumes
> 
>  
If that doesn't allow it to heal, you may need to find which filename that's hardlinked to. ls -li the gfid file at the path I demonstrated earlier. With that inode number in hand, find $brick -inum $inode_number Once you know which filenames it's linked with, remove all linked copies from all but one replica. Then the self-heal can continue successfully.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users