Further to this, I've noticed something which might have been a bit of a
red herring in my previous post.
We have 3 volumes - gv0, voicemail and callrec. callrec is the only one
showing self heal entries, yet all of the "No such file or directory"
errors in glustershd.log appear to refer to gv0. gv0 has no self heal
entries shown by "gluster volume heal gv0 info", and no split brain
entries either.
If I de-dupe those log entries, I just get these:
[root@gluster1a-1 glusterfs]# grep gfid: glustershd.log | awk -F\] '{print $3}' | sort | uniq
0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537)
0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0)
0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:b1e273ad-9eb1-4f97-a41c-39eecb149bd6> (b1e273ad-9eb1-4f97-a41c-39eecb149bd6)
0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537)
0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0)
0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0)
There doesn't seem anything obvious to me in glustershd.log about the
callrec volume. On one of the bricks that stayed up:
[root@gluster1a-1 glusterfs]# grep callrec glustershd.log
[2016-07-08 08:54:03.424446] I [graph.c:269:gf_add_cmdline_options] 0-callrec-replicate-0: adding option 'node-uuid' for volume 'callrec-replicate-0' with value 'b9d3b1a2-3214-41ba-a1c9-9c7d4b18ff5d'
[2016-07-08 08:54:03.429663] I [client.c:2280:notify] 0-callrec-client-0: parent translators are ready, attempting connect on transport
[2016-07-08 08:54:03.432198] I [client.c:2280:notify] 0-callrec-client-1: parent translators are ready, attempting connect on transport
[2016-07-08 08:54:03.434375] I [client.c:2280:notify] 0-callrec-client-2: parent translators are ready, attempting connect on transport
[2016-07-08 08:54:03.436521] I [client.c:2280:notify] 0-callrec-client-3: parent translators are ready, attempting connect on transport
1: volume callrec-client-0
5: option remote-subvolume /data/brick/callrec
11: volume callrec-client-1
15: option remote-subvolume /data/brick/callrec
21: volume callrec-client-2
25: option remote-subvolume /data/brick/callrec
31: volume callrec-client-3
35: option remote-subvolume /data/brick/callrec
41: volume callrec-replicate-0
50: subvolumes callrec-client-0 callrec-client-1 callrec-client-2 callrec-client-3
159: subvolumes callrec-replicate-0 gv0-replicate-0 voicemail-replicate-0
[2016-07-08 08:54:03.458708] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-0: changing port to 49153 (from 0)
[2016-07-08 08:54:03.465684] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-08 08:54:03.465921] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-0: Connected to callrec-client-0, attached to remote volume '/data/brick/callrec'.
[2016-07-08 08:54:03.465927] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-08 08:54:03.465967] I [MSGID: 108005] [afr-common.c:3669:afr_notify] 0-callrec-replicate-0: Subvolume 'callrec-client-0' came back up; going online.
[2016-07-08 08:54:03.466108] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-0: Server lk version = 1
[2016-07-08 08:54:04.266979] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-1: changing port to 49153 (from 0)
[2016-07-08 08:54:04.732625] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-2: changing port to 49153 (from 0)
[2016-07-08 08:54:04.738533] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-08 08:54:04.738911] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-2: Connected to callrec-client-2, attached to remote volume '/data/brick/callrec'.
[2016-07-08 08:54:04.738921] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-08 08:54:04.739181] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-2: Server lk version = 1
[2016-07-08 08:54:05.271388] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-08 08:54:05.271858] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-1: Connected to callrec-client-1, attached to remote volume '/data/brick/callrec'.
[2016-07-08 08:54:05.271879] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-08 08:54:05.272185] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-1: Server lk version = 1
[2016-07-08 08:54:06.302301] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-3: changing port to 49153 (from 0)
[2016-07-08 08:54:06.305473] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-08 08:54:06.305915] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-3: Connected to callrec-client-3, attached to remote volume '/data/brick/callrec'.
[2016-07-08 08:54:06.305925] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-08 08:54:06.306307] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-3: Server lk version = 1
And on the brick that went offline for a few days:
[root@gluster2a-1 glusterfs]# grep callrec glustershd.log
[2016-07-08 08:54:06.900964] I [graph.c:269:gf_add_cmdline_options] 0-callrec-replicate-0: adding option 'node-uuid' for volume 'callrec-replicate-0' with value 'e96ae8cd-f38f-4c2a-bb3b-baeb78f88f13'
[2016-07-08 08:54:06.906449] I [client.c:2280:notify] 0-callrec-client-0: parent translators are ready, attempting connect on transport
[2016-07-08 08:54:06.908851] I [client.c:2280:notify] 0-callrec-client-1: parent translators are ready, attempting connect on transport
[2016-07-08 08:54:06.911045] I [client.c:2280:notify] 0-callrec-client-2: parent translators are ready, attempting connect on transport
[2016-07-08 08:54:06.913528] I [client.c:2280:notify] 0-callrec-client-3: parent translators are ready, attempting connect on transport
1: volume callrec-client-0
5: option remote-subvolume /data/brick/callrec
11: volume callrec-client-1
15: option remote-subvolume /data/brick/callrec
21: volume callrec-client-2
25: option remote-subvolume /data/brick/callrec
31: volume callrec-client-3
35: option remote-subvolume /data/brick/callrec
41: volume callrec-replicate-0
50: subvolumes callrec-client-0 callrec-client-1 callrec-client-2 callrec-client-3
159: subvolumes callrec-replicate-0 gv0-replicate-0 voicemail-replicate-0
[2016-07-08 08:54:06.938769] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-2: changing port to 49153 (from 0)
[2016-07-08 08:54:06.948204] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-1: changing port to 49153 (from 0)
[2016-07-08 08:54:06.951625] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-2: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-08 08:54:06.951849] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-2: Connected to callrec-client-2, attached to remote volume '/data/brick/callrec'.
[2016-07-08 08:54:06.951858] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-2: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-08 08:54:06.951906] I [MSGID: 108005] [afr-common.c:3669:afr_notify] 0-callrec-replicate-0: Subvolume 'callrec-client-2' came back up; going online.
[2016-07-08 08:54:06.951938] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-2: Server lk version = 1
[2016-07-08 08:54:07.152217] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-3: changing port to 49153 (from 0)
[2016-07-08 08:54:07.167137] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-08 08:54:07.167474] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-1: Connected to callrec-client-1, attached to remote volume '/data/brick/callrec'.
[2016-07-08 08:54:07.167483] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-08 08:54:07.167664] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-1: Server lk version = 1
[2016-07-08 08:54:07.240249] I [rpc-clnt.c:1761:rpc_clnt_reconfig] 0-callrec-client-0: changing port to 49153 (from 0)
[2016-07-08 08:54:07.243156] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-08 08:54:07.243512] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-0: Connected to callrec-client-0, attached to remote volume '/data/brick/callrec'.
[2016-07-08 08:54:07.243520] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-08 08:54:07.243804] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-0: Server lk version = 1
[2016-07-08 08:54:07.400188] I [client-handshake.c:1413:select_server_supported_programs] 0-callrec-client-3: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-07-08 08:54:07.400574] I [client-handshake.c:1200:client_setvolume_cbk] 0-callrec-client-3: Connected to callrec-client-3, attached to remote volume '/data/brick/callrec'.
[2016-07-08 08:54:07.400583] I [client-handshake.c:1210:client_setvolume_cbk] 0-callrec-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-08 08:54:07.400802] I [client-handshake.c:188:client_set_lk_version_cbk] 0-callrec-client-3: Server lk version = 1
Cheers,
Kingsley.
On Fri, 2016-07-08 at 10:08 +0100, Kingsley wrote:
Hi,
One of our bricks was offline for a few days when it didn't reboot after
a yum update (the gluster version wasn't changed). The volume heal info
is showing the same 129 entries, all of the format
<gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> on the 3 bricks that
remained up, and no entries on the brick that was offline.
glustershd.log on the brick that was offline has stuff like this in it:
[2016-07-08 08:54:07.411486] I [client-handshake.c:1200:client_setvolume_cbk] 0-gv0-client-1: Connected to gv0-client-1, attached to remote volume '/data/brick/gv0'.
[2016-07-08 08:54:07.411493] I [client-handshake.c:1210:client_setvolume_cbk] 0-gv0-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-08 08:54:07.411678] I [client-handshake.c:188:client_set_lk_version_cbk] 0-gv0-client-1: Server lk version = 1
[2016-07-08 08:54:07.793661] I [client-handshake.c:1200:client_setvolume_cbk] 0-gv0-client-3: Connected to gv0-client-3, attached to remote volume '/data/brick/gv0'.
[2016-07-08 08:54:07.793688] I [client-handshake.c:1210:client_setvolume_cbk] 0-gv0-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2016-07-08 08:54:07.794091] I [client-handshake.c:188:client_set_lk_version_cbk] 0-gv0-client-3: Server lk version = 1
but glustershd.log on the other 3 bricks has many lines looking like
this:
[2016-07-08 09:05:17.203017] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:81dc9194-2379-40b5-a949-f7550433b2e0> (81dc9194-2379-40b5-a949-f7550433b2e0)
[2016-07-08 09:05:17.203405] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:b1e273ad-9eb1-4f97-a41c-39eecb149bd6> (b1e273ad-9eb1-4f97-a41c-39eecb149bd6)
[2016-07-08 09:05:17.204035] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537)
[2016-07-08 09:05:17.204225] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:436dcbec-a12a-4df9-b8ef-bae977c98537> (436dcbec-a12a-4df9-b8ef-bae977c98537)
[2016-07-08 09:05:17.204651] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-0: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
[2016-07-08 09:05:17.204879] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-1: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
[2016-07-08 09:05:17.205042] W [client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-gv0-client-3: remote operation failed: No such file or directory. Path: <gfid:08713e43-7bcb-43f3-818a-7b062abd6e95> (08713e43-7bcb-43f3-818a-7b062abd6e95)
How do I fix this? I need to update the other bricks but am reluctant to
do so until the volume is in good shape first.
We're running Gluster 3.6.3 on CentOS 7. Volume info:
Volume Name: callrec
Type: Replicate
Volume ID: a39830b7-eddb-4061-b381-39411274131a
Status: Started
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: gluster1a-1:/data/brick/callrec
Brick2: gluster1b-1:/data/brick/callrec
Brick3: gluster2a-1:/data/brick/callrec
Brick4: gluster2b-1:/data/brick/callrec
Options Reconfigured:
performance.flush-behind: off
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users