Further to this, I've noticed something which might have been a bit of a red herring in my previous post. We have 3 volumes - gv0, voicemail and callrec. callrec is the only one showing self heal entries, yet all of the "No such file or directory" errors in glustershd.log appear to refer to gv0. gv0 has no self heal entries shown by "gluster volume heal gv0 info", and no split brain entries either. If I de-dupe those log entries, I just get these: [root@gluster1a-1 glusterfs]# grep gfid: glustershd.log | awk -F\] '{print $3}' | sort | uniq 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95) 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537) 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0) 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:b1e273ad-9eb1-4f97-a41c-39eecb149bd6> (b1e273ad-9eb1-4f97-a41c-39eecb149bd6) 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95) 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537) 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0) 0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95) 0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0) There doesn't seem anything obvious to me in glustershd.log about the callrec volume. On one of the bricks that stayed up: [root@gluster1a-1 glusterfs]# grep callrec glustershd.log [2016-07-08 08:54:03.424446] I [graph.c:269:gf_add_cmdline_options] 0-callrec-replicate-0: adding option 'node-uuid' for volume 'callrec-replicate-0' with value 'b9d3b1a2-3214-41ba-a1c9-9c7d4b18ff5d' [2016-07-08 08:54:03.429663] I [client.c:2280:notify] 0-callrec-client-0: parent translators are ready, attempting connect on transport [2016-07-08 08:54:03.432198] I [client.c:2280:notify] 0-callrec-client-1: parent translators are ready, attempting connect on transport [2016-07-08 08:54:03.434375] I [client.c:2280:notify] 0-callrec-client-2: parent translators are ready, attempting connect on transport [2016-07-08 08:54:03.436521] I [client.c:2280:notify] 0-callrec-client-3: parent translators are ready, attempting connect on transport 1: volume callrec-client-0 5: option remote-subvolume /data/brick/callrec 11: volume callrec-client-1 15: option remote-subvolume /data/brick/callrec 21: volume callrec-client-2 25: option remote-subvolume /data/brick/callrec 31: volume callrec-client-3 35: option remote-subvolume /data/brick/callrec 41: volume callrec-replicate-0 50: subvolumes callrec-client-0 callrec-client-1 callrec-client-2 callrec-client-3 159: subvolumes callrec-replicate-0 gv0-replicate-0 voicemail-replicate-0 [2016-07-08 08:54:03.458708] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-0: changing port to 49153 (from 0) [2016-07-08 08:54:03.465684] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-07-08 08:54:03.465921] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-0: Connected to callrec-client-0, attached to remote volume '/data/brick/callrec'. [2016-07-08 08:54:03.465927] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-0: Server and Client lk-version numbers are not same, reopening the fds [2016-07-08 08:54:03.465967] I [MSGID: 108005] [afr-common.c:3669:afr_notify] 0-callrec-replicate-0: Subvolume 'callrec-client-0' came back up; going online. [2016-07-08 08:54:03.466108] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-0: Server lk version = 1 [2016-07-08 08:54:04.266979] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-1: changing port to 49153 (from 0) [2016-07-08 08:54:04.732625] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-2: changing port to 49153 (from 0) [2016-07-08 08:54:04.738533] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-07-08 08:54:04.738911] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-2: Connected to callrec-client-2, attached to remote volume '/data/brick/callrec'. [2016-07-08 08:54:04.738921] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-2: Server and Client lk-version numbers are not same, reopening the fds [2016-07-08 08:54:04.739181] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-2: Server lk version = 1 [2016-07-08 08:54:05.271388] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-07-08 08:54:05.271858] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-1: Connected to callrec-client-1, attached to remote volume '/data/brick/callrec'. [2016-07-08 08:54:05.271879] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-07-08 08:54:05.272185] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-1: Server lk version = 1 [2016-07-08 08:54:06.302301] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-3: changing port to 49153 (from 0) [2016-07-08 08:54:06.305473] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-07-08 08:54:06.305915] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-3: Connected to callrec-client-3, attached to remote volume '/data/brick/callrec'. [2016-07-08 08:54:06.305925] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-3: Server and Client lk-version numbers are not same, reopening the fds [2016-07-08 08:54:06.306307] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-3: Server lk version = 1 And on the brick that went offline for a few days: [root@gluster2a-1 glusterfs]# grep callrec glustershd.log [2016-07-08 08:54:06.900964] I [graph.c:269:gf_add_cmdline_options] 0-callrec-replicate-0: adding option 'node-uuid' for volume 'callrec-replicate-0' with value 'e96ae8cd-f38f-4c2a-bb3b-baeb78f88f13' [2016-07-08 08:54:06.906449] I [client.c:2280:notify] 0-callrec-client-0: parent translators are ready, attempting connect on transport [2016-07-08 08:54:06.908851] I [client.c:2280:notify] 0-callrec-client-1: parent translators are ready, attempting connect on transport [2016-07-08 08:54:06.911045] I [client.c:2280:notify] 0-callrec-client-2: parent translators are ready, attempting connect on transport [2016-07-08 08:54:06.913528] I [client.c:2280:notify] 0-callrec-client-3: parent translators are ready, attempting connect on transport 1: volume callrec-client-0 5: option remote-subvolume /data/brick/callrec 11: volume callrec-client-1 15: option remote-subvolume /data/brick/callrec 21: volume callrec-client-2 25: option remote-subvolume /data/brick/callrec 31: volume callrec-client-3 35: option remote-subvolume /data/brick/callrec 41: volume callrec-replicate-0 50: subvolumes callrec-client-0 callrec-client-1 callrec-client-2 callrec-client-3 159: subvolumes callrec-replicate-0 gv0-replicate-0 voicemail-replicate-0 [2016-07-08 08:54:06.938769] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-2: changing port to 49153 (from 0) [2016-07-08 08:54:06.948204] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-1: changing port to 49153 (from 0) [2016-07-08 08:54:06.951625] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-07-08 08:54:06.951849] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-2: Connected to callrec-client-2, attached to remote volume '/data/brick/callrec'. [2016-07-08 08:54:06.951858] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-2: Server and Client lk-version numbers are not same, reopening the fds [2016-07-08 08:54:06.951906] I [MSGID: 108005] [afr-common.c:3669:afr_notify] 0-callrec-replicate-0: Subvolume 'callrec-client-2' came back up; going online. [2016-07-08 08:54:06.951938] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-2: Server lk version = 1 [2016-07-08 08:54:07.152217] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-3: changing port to 49153 (from 0) [2016-07-08 08:54:07.167137] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-07-08 08:54:07.167474] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-1: Connected to callrec-client-1, attached to remote volume '/data/brick/callrec'. [2016-07-08 08:54:07.167483] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-1: Server and Client lk-version numbers are not same, reopening the fds [2016-07-08 08:54:07.167664] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-1: Server lk version = 1 [2016-07-08 08:54:07.240249] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-0: changing port to 49153 (from 0) [2016-07-08 08:54:07.243156] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-07-08 08:54:07.243512] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-0: Connected to callrec-client-0, attached to remote volume '/data/brick/callrec'. [2016-07-08 08:54:07.243520] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-0: Server and Client lk-version numbers are not same, reopening the fds [2016-07-08 08:54:07.243804] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-0: Server lk version = 1 [2016-07-08 08:54:07.400188] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2016-07-08 08:54:07.400574] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-3: Connected to callrec-client-3, attached to remote volume '/data/brick/callrec'. [2016-07-08 08:54:07.400583] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-3: Server and Client lk-version numbers are not same, reopening the fds [2016-07-08 08:54:07.400802] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-3: Server lk version = 1 Cheers, Kingsley. On Fri, 2016-07-08 at 10:08 +0100, Kingsley wrote: > Hi, > > One of our bricks was offline for a few days when it didn't reboot after > a yum update (the gluster version wasn't changed). The volume heal info > is showing the same 129 entries, all of the format > <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> on the 3 bricks that > remained up, and no entries on the brick that was offline. > > glustershd.log on the brick that was offline has stuff like this in it: > > [2016-07-08 08:54:07.411486] I [client-handshake.c:1200:client_setvolume_cbk] 0-gv0-client-1: Connected to gv0-client-1, attached to remote volume '/data/brick/gv0'. > [2016-07-08 08:54:07.411493] I [client-handshake.c:1210:client_setvolume_cbk] 0-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds > [2016-07-08 08:54:07.411678] I [client-handshake.c:188:client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1 > [2016-07-08 08:54:07.793661] I [client-handshake.c:1200:client_setvolume_cbk] 0-gv0-client-3: Connected to gv0-client-3, attached to remote volume '/data/brick/gv0'. > [2016-07-08 08:54:07.793688] I [client-handshake.c:1210:client_setvolume_cbk] 0-gv0-client-3: Server and Client lk-version numbers are not same, reopening the fds > [2016-07-08 08:54:07.794091] I [client-handshake.c:188:client_set_lk_version_cbk] 0-gv0-client-3: Server lk version = 1 > > but glustershd.log on the other 3 bricks has many lines looking like > this: > > [2016-07-08 09:05:17.203017] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0) > [2016-07-08 09:05:17.203405] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:b1e273ad-9eb1-4f97-a41c-39eecb149bd6> (b1e273ad-9eb1-4f97-a41c-39eecb149bd6) > [2016-07-08 09:05:17.204035] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537) > [2016-07-08 09:05:17.204225] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537) > [2016-07-08 09:05:17.204651] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95) > [2016-07-08 09:05:17.204879] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95) > [2016-07-08 09:05:17.205042] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95) > > How do I fix this? I need to update the other bricks but am reluctant to > do so until the volume is in good shape first. > > We're running Gluster 3.6.3 on CentOS 7. Volume info: > > Volume Name: callrec > Type: Replicate > Volume ID: a39830b7-eddb-4061-b381-39411274131a > Status: Started > Number of Bricks: 1 x 4 = 4 > Transport-type: tcp > Bricks: > Brick1: gluster1a-1:/data/brick/callrec > Brick2: gluster1b-1:/data/brick/callrec > Brick3: gluster2a-1:/data/brick/callrec > Brick4: gluster2b-1:/data/brick/callrec > Options Reconfigured: > performance.flush-behind: off > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users