gluster 3.7.3 - volume heal info hangs - unknown heal status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

Our provider had network maintenance this night, so 2 of our 4 servers got disconnected and reconnected. Since we knew this was coming, we shifted all work load off the affected servers. This morning, most of the cluster seems fine, but for one volume, no heal info can be retrieved, so we basically don't know about the healing state of the volume. The volume is a replica 2 volume between vhost4-int/brick1 and vhost3-int/brick2.

The volume is accessible, but since I don't get any heal info, I don't know if it is probably replicated. Any help to resolve this situation is highly appreciated. 

hangs forever:
[root@vhost4 ~]# gluster volume heal vol4 info

glfsheal-vol4.log:
[2015-09-24 07:47:59.284723] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-09-24 07:47:59.293735] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2015-09-24 07:47:59.294061] I [MSGID: 104045] [glfs-master.c:95:notify] 0-gfapi: New graph 76686f73-7434-2e61-6c6c-61626f757461 (0) coming up
[2015-09-24 07:47:59.294081] I [MSGID: 114020] [client.c:2118:notify] 0-vol4-client-1: parent translators are ready, attempting connect on transport
[2015-09-24 07:47:59.309470] I [MSGID: 114020] [client.c:2118:notify] 0-vol4-client-2: parent translators are ready, attempting connect on transport
[2015-09-24 07:47:59.310525] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol4-client-1: changing port to 49155 (from 0)
[2015-09-24 07:47:59.315958] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-vol4-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-09-24 07:47:59.316481] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vol4-client-1: Connected to vol4-client-1, attached to remote volume '/storage/brick2/brick2'.
[2015-09-24 07:47:59.316495] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vol4-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2015-09-24 07:47:59.316538] I [MSGID: 108005] [afr-common.c:3960:afr_notify] 0-vol4-replicate-0: Subvolume 'vol4-client-1' came back up; going online.
[2015-09-24 07:47:59.317150] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vol4-client-1: Server lk version = 1
[2015-09-24 07:47:59.320898] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol4-client-2: changing port to 49154 (from 0)
[2015-09-24 07:47:59.325633] I [MSGID: 114057] [client-handshake.c:1437:select_server_supported_programs] 0-vol4-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2015-09-24 07:47:59.325780] I [MSGID: 114046] [client-handshake.c:1213:client_setvolume_cbk] 0-vol4-client-2: Connected to vol4-client-2, attached to remote volume '/storage/brick1/brick1'.
[2015-09-24 07:47:59.325791] I [MSGID: 114047] [client-handshake.c:1224:client_setvolume_cbk] 0-vol4-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2015-09-24 07:47:59.333346] I [MSGID: 114035] [client-handshake.c:193:client_set_lk_version_cbk] 0-vol4-client-2: Server lk version = 1
[2015-09-24 07:47:59.334545] I [MSGID: 108031] [afr-common.c:1745:afr_local_discovery_cbk] 0-vol4-replicate-0: selecting local read_child vol4-client-2
[2015-09-24 07:47:59.335833] I [MSGID: 104041] [glfs-resolve.c:862:__glfs_active_subvol] 0-vol4: switched to graph 76686f73-7434-2e61-6c6c-61626f757461 (0)

Questions to this output: 
-) Why does it report "Using Program GlusterFS 3.3, Num (1298437), Version (330)". We run 3.7.3 ?!
-) guster logs timestamps in UTC not taking server timezone into account. Is there a way to fix this?

etc-glusterfs-glusterd.vol.log:
no logs to after volume heal info command

storage-brick1-brick1.log:
[2015-09-24 07:47:59.325720] I [login.c:81:gf_auth] 0-auth/login: allowed user names: 67ef1559-d3a1-403a-b8e7-fb145c3acf4e
[2015-09-24 07:47:59.325743] I [MSGID: 115029] [server-handshake.c:610:server_setvolume] 0-vol4-server: accepted client from vhost4.allaboutapps.at-14900-2015/09/24-07:47:59:282313-vol4-client-2-0-0 (version: 3.7.3)

storage-brick2-brick2.log:
no logs to after volume heal info command


Thanks,

- Andreas


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux