----- Original Message ----- > From: "Andreas Mather" <andreas@xxxxxxxxxxxxxxx> > To: "Anuradha Talur" <atalur@xxxxxxxxxx> > Cc: "Gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx> > Sent: Thursday, September 24, 2015 6:59:38 PM > Subject: Re: gluster 3.7.3 - volume heal info hangs - unknown heal status > > Hi Anuradha! > > Thanks for your reply! Attached you can find the dump files. As I'm not > sure if they make their way through as attachments, here're links to them > as well: > > brick1 - http://pastebin.com/3ivkhuRH > brick2 - http://pastebin.com/77sT1mut Hi, I see some blocked locks from the statedump. Could you let me know what kind of workload you had when you observed the hang? > > - Andreas > > > > > On Thu, Sep 24, 2015 at 3:18 PM, Anuradha Talur <atalur@xxxxxxxxxx> wrote: > > > > > > > ----- Original Message ----- > > > From: "Andreas Mather" <andreas@xxxxxxxxxxxxxxx> > > > To: "Gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx> > > > Sent: Thursday, September 24, 2015 1:24:12 PM > > > Subject: gluster 3.7.3 - volume heal info hangs - > > unknown heal status > > > > > > Hi! > > > > > > Our provider had network maintenance this night, so 2 of our 4 servers > > got > > > disconnected and reconnected. Since we knew this was coming, we shifted > > all > > > work load off the affected servers. This morning, most of the cluster > > seems > > > fine, but for one volume, no heal info can be retrieved, so we basically > > > don't know about the healing state of the volume. The volume is a > > replica 2 > > > volume between vhost4-int/brick1 and vhost3-int/brick2. > > > > > > The volume is accessible, but since I don't get any heal info, I don't > > know > > > if it is probably replicated. Any help to resolve this situation is > > highly > > > appreciated. > > > > > > hangs forever: > > > [root@vhost4 ~]# gluster volume heal vol4 info > > > > > > glfsheal-vol4.log: > > > [2015-09-24 07:47:59.284723] I [MSGID: 101190] > > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with > > > index 1 > > > [2015-09-24 07:47:59.293735] I [MSGID: 101190] > > > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread > > with > > > index 2 > > > [2015-09-24 07:47:59.294061] I [MSGID: 104045] [glfs-master.c:95:notify] > > > 0-gfapi: New graph 76686f73-7434-2e61-6c6c-61626f757461 (0) coming up > > > [2015-09-24 07:47:59.294081] I [MSGID: 114020] [client.c:2118:notify] > > > 0-vol4-client-1: parent translators are ready, attempting connect on > > > transport > > > [2015-09-24 07:47:59.309470] I [MSGID: 114020] [client.c:2118:notify] > > > 0-vol4-client-2: parent translators are ready, attempting connect on > > > transport > > > [2015-09-24 07:47:59.310525] I [rpc-clnt.c:1819:rpc_clnt_reconfig] > > > 0-vol4-client-1: changing port to 49155 (from 0) > > > [2015-09-24 07:47:59.315958] I [MSGID: 114057] > > > [client-handshake.c:1437:select_server_supported_programs] > > 0-vol4-client-1: > > > Using Program GlusterFS 3.3, Num (1298437), Version (330) > > > [2015-09-24 07:47:59.316481] I [MSGID: 114046] > > > [client-handshake.c:1213:client_setvolume_cbk] 0-vol4-client-1: > > Connected to > > > vol4-client-1, attached to remote volume '/storage/brick2/brick2'. > > > [2015-09-24 07:47:59.316495] I [MSGID: 114047] > > > [client-handshake.c:1224:client_setvolume_cbk] 0-vol4-client-1: Server > > and > > > Client lk-version numbers are not same, reopening the fds > > > [2015-09-24 07:47:59.316538] I [MSGID: 108005] > > [afr-common.c:3960:afr_notify] > > > 0-vol4-replicate-0: Subvolume 'vol4-client-1' came back up; going online. > > > [2015-09-24 07:47:59.317150] I [MSGID: 114035] > > > [client-handshake.c:193:client_set_lk_version_cbk] 0-vol4-client-1: > > Server > > > lk version = 1 > > > [2015-09-24 07:47:59.320898] I [rpc-clnt.c:1819:rpc_clnt_reconfig] > > > 0-vol4-client-2: changing port to 49154 (from 0) > > > [2015-09-24 07:47:59.325633] I [MSGID: 114057] > > > [client-handshake.c:1437:select_server_supported_programs] > > 0-vol4-client-2: > > > Using Program GlusterFS 3.3, Num (1298437), Version (330) > > > [2015-09-24 07:47:59.325780] I [MSGID: 114046] > > > [client-handshake.c:1213:client_setvolume_cbk] 0-vol4-client-2: > > Connected to > > > vol4-client-2, attached to remote volume '/storage/brick1/brick1'. > > > [2015-09-24 07:47:59.325791] I [MSGID: 114047] > > > [client-handshake.c:1224:client_setvolume_cbk] 0-vol4-client-2: Server > > and > > > Client lk-version numbers are not same, reopening the fds > > > [2015-09-24 07:47:59.333346] I [MSGID: 114035] > > > [client-handshake.c:193:client_set_lk_version_cbk] 0-vol4-client-2: > > Server > > > lk version = 1 > > > [2015-09-24 07:47:59.334545] I [MSGID: 108031] > > > [afr-common.c:1745:afr_local_discovery_cbk] 0-vol4-replicate-0: selecting > > > local read_child vol4-client-2 > > > [2015-09-24 07:47:59.335833] I [MSGID: 104041] > > > [glfs-resolve.c:862:__glfs_active_subvol] 0-vol4: switched to graph > > > 76686f73-7434-2e61-6c6c-61626f757461 (0) > > > > > > Questions to this output: > > > -) Why does it report " Using Program GlusterFS 3.3, Num (1298437), > > Version > > > (330) ". We run 3.7.3 ?! > > > -) guster logs timestamps in UTC not taking server timezone into > > account. Is > > > there a way to fix this? > > > > > > etc-glusterfs-glusterd.vol.log: > > > no logs to after volume heal info command > > > > > > storage-brick1-brick1.log: > > > [2015-09-24 07:47:59.325720] I [login.c:81:gf_auth] 0-auth/login: allowed > > > user names: 67ef1559-d3a1-403a-b8e7-fb145c3acf4e > > > [2015-09-24 07:47:59.325743] I [MSGID: 115029] > > > [server-handshake.c:610:server_setvolume] 0-vol4-server: accepted client > > > from > > > vhost4.allaboutapps.at-14900-2015/09/24-07:47:59:282313-vol4-client-2-0-0 > > > (version: 3.7.3) > > > > > > storage-brick2-brick2.log: > > > no logs to after volume heal info command > > > > > > > > Hi Andreas, > > > > Could you please provide the following information so that we can > > understand why the command is hanging? > > When the command is hung, run the following command from one of the > > servers: > > `gluster volume statedump <volname>` > > This command will generate statedumps of glusterfsd processes in the > > servers. You can find them at /var/run/gluster . A typical statedump for a > > brick has "<brick-path>.<pid-of-brick>.dump.<timestamp>" as its name. Could > > you please attach them and respond? > > > > > Thanks, > > > > > > - Andreas > > > > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users@xxxxxxxxxxx > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > -- > > Thanks, > > Anuradha. > > > -- Thanks, Anuradha. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users