I think all issues are linked with the same underlying problem.
1. all peers were in Connected state from every node yesterday but node 2 is "semi-connected" at now:
root@glusterDevVM1:/var/log/glusterfs# gluster peer status
Number of Peers: 2
Hostname: glusterDevVM3
Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974
State: Peer in Cluster (Connected)
Hostname: glusterDevVM2
Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18
State: Peer in Cluster (Connected)
Number of Peers: 2
Hostname: glusterDevVM3
Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974
State: Peer in Cluster (Connected)
Hostname: glusterDevVM2
Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18
State: Peer in Cluster (Connected)
root@glusterDevVM2:~# gluster peer status
Number of Peers: 2
Hostname: glusterDevVM1
Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb
State: Peer in Cluster (Disconnected)
Hostname: glusterDevVM3
Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974
State: Peer in Cluster (Connected)
root@glusterDevVM3:~# gluster peer status
Number of Peers: 2
Hostname: glusterDevVM2
Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18
State: Peer in Cluster (Connected)
Hostname: glusterDevVM1
Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb
State: Peer in Cluster (Connected)
Number of Peers: 2
Hostname: glusterDevVM1
Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb
State: Peer in Cluster (Disconnected)
Hostname: glusterDevVM3
Uuid: 0d8a3686-9e37-4ce7-87bf-c85d1ec40974
State: Peer in Cluster (Connected)
root@glusterDevVM3:~# gluster peer status
Number of Peers: 2
Hostname: glusterDevVM2
Uuid: 7f6c3023-144b-4db2-9063-d90926dbdd18
State: Peer in Cluster (Connected)
Hostname: glusterDevVM1
Uuid: e2263e4d-a307-45d5-9cec-e1791f7a45fb
State: Peer in Cluster (Connected)
2, 3, 4. a simple gluster volume status show multiple errors on each node:
root@glusterDevVM1:~# gluster volume status tmp
Locking failed on glusterDevVM2. Please check log file for details.
root@glusterDevVM2:~# gluster volume status tmp
Another transaction is in progress for tmp. Please try again after some time.
root@glusterDevVM3:~# gluster volume status tmp
Error : Request timed out
Locking failed on glusterDevVM2. Please check log file for details.
root@glusterDevVM2:~# gluster volume status tmp
Another transaction is in progress for tmp. Please try again after some time.
root@glusterDevVM3:~# gluster volume status tmp
Error : Request timed out
Logs for each node (except SSL errors):
root@glusterDevVM1:~# egrep -v '0-socket.management' /var/log/glusterfs/glusterd.log
[2020-04-22 05:38:32.278618] E [rpc-clnt.c:346:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7fd28d99fda8] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcd97)[0x7fd28d745d97] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcebe)[0x7fd28d745ebe] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)[0x7fd28d746e93] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xea18)[0x7fd28d747a18] ))))) 0-management: forced unwinding frame type(Peer mgmt) op(--(4)) called at 2020-04-22 05:38:32.243087 (xid=0x8d)
[2020-04-22 05:38:32.278638] E [MSGID: 106157] [glusterd-rpc-ops.c:665:__glusterd_friend_update_cbk] 0-management: RPC Error
[2020-04-22 05:38:32.278651] I [MSGID: 106493] [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received RJT from uuid: 00000000-0000-0000-0000-000000000000
[2020-04-22 05:38:33.256401] I [MSGID: 106498] [glusterd-svc-helper.c:747:__glusterd_send_svc_configure_req] 0-management: not connected yet
[2020-04-22 05:38:35.279149] I [socket.c:4347:ssl_setup_connection_params] 0-management: SSL support on the I/O path is ENABLED
[2020-04-22 05:38:35.279169] I [socket.c:4350:ssl_setup_connection_params] 0-management: SSL support for glusterd is ENABLED
[2020-04-22 05:38:35.279178] I [socket.c:4360:ssl_setup_connection_params] 0-management: using certificate depth 1
The message "I [MSGID: 106004] [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer <glusterDevVM2> (<7f6c3023-144b-4db2-9063-d90926dbdd18>), in state <Peer in Cluster>, has disconnected from glusterd." repeated 3 times between [2020-04-22 05:38:25.232116] and [2020-04-22 05:38:35.667153]
[2020-04-22 05:38:35.667255] W [glusterd-locks.c:796:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119) [0x7fd287da7119] -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae) [0x7fd287db1aae] -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3) [0x7fd287e640d3] ) 0-management: Lock for vol <vol> not held
[2020-04-22 05:38:35.667275] W [MSGID: 106117] [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not released for <vol>
2 last lines repeated for each volume
[2020-04-22 05:38:32.278638] E [MSGID: 106157] [glusterd-rpc-ops.c:665:__glusterd_friend_update_cbk] 0-management: RPC Error
[2020-04-22 05:38:32.278651] I [MSGID: 106493] [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received RJT from uuid: 00000000-0000-0000-0000-000000000000
[2020-04-22 05:38:33.256401] I [MSGID: 106498] [glusterd-svc-helper.c:747:__glusterd_send_svc_configure_req] 0-management: not connected yet
[2020-04-22 05:38:35.279149] I [socket.c:4347:ssl_setup_connection_params] 0-management: SSL support on the I/O path is ENABLED
[2020-04-22 05:38:35.279169] I [socket.c:4350:ssl_setup_connection_params] 0-management: SSL support for glusterd is ENABLED
[2020-04-22 05:38:35.279178] I [socket.c:4360:ssl_setup_connection_params] 0-management: using certificate depth 1
The message "I [MSGID: 106004] [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer <glusterDevVM2> (<7f6c3023-144b-4db2-9063-d90926dbdd18>), in state <Peer in Cluster>, has disconnected from glusterd." repeated 3 times between [2020-04-22 05:38:25.232116] and [2020-04-22 05:38:35.667153]
[2020-04-22 05:38:35.667255] W [glusterd-locks.c:796:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119) [0x7fd287da7119] -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae) [0x7fd287db1aae] -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3) [0x7fd287e640d3] ) 0-management: Lock for vol <vol> not held
[2020-04-22 05:38:35.667275] W [MSGID: 106117] [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not released for <vol>
2 last lines repeated for each volume
root@glusterDevVM2:~# egrep -v '0-socket.management' /var/log/glusterfs/glusterd.log
[2020-04-22 05:51:57.493574] E [rpc-clnt.c:346:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x138)[0x7f30411dbda8] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcd97)[0x7f3040f81d97] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xcebe)[0x7f3040f81ebe] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)[0x7f3040f82e93] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(+0xea18)[0x7f3040f83a18] ))))) 0-management: forced unwinding frame type(Gluster MGMT Handshake) op(MGMT-VERS(1)) called at 2020-04-22 05:51:57.483579 (xid=0x563)
[2020-04-22 05:51:57.493623] E [MSGID: 106167] [glusterd-handshake.c:2040:__glusterd_mgmt_hndsk_version_cbk] 0-management: Error through RPC layer, retry again later
[2020-04-22 05:52:00.501474] I [socket.c:4347:ssl_setup_connection_params] 0-management: SSL support on the I/O path is ENABLED
[2020-04-22 05:52:00.501542] I [socket.c:4350:ssl_setup_connection_params] 0-management: SSL support for glusterd is ENABLED
[2020-04-22 05:52:00.501569] I [socket.c:4360:ssl_setup_connection_params] 0-management: using certificate depth 1
[2020-04-22 05:52:00.983720] I [MSGID: 106004] [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer <glusterDevVM1> (<e2263e4d-a307-45d5-9cec-e1791f7a45fb>), in state <Peer in Cluster>, has disconnected from glusterd.
[2020-04-22 05:52:00.983886] W [glusterd-locks.c:796:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119) [0x7f303b5e3119] -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae) [0x7f303b5edaae] -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3) [0x7f303b6a00d3] ) 0-management: Lock for vol <vol> not held
[2020-04-22 05:52:00.983909] W [MSGID: 106117] [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not released for <vol>
2 last lines repeated for each volume
[2020-04-22 05:51:57.493623] E [MSGID: 106167] [glusterd-handshake.c:2040:__glusterd_mgmt_hndsk_version_cbk] 0-management: Error through RPC layer, retry again later
[2020-04-22 05:52:00.501474] I [socket.c:4347:ssl_setup_connection_params] 0-management: SSL support on the I/O path is ENABLED
[2020-04-22 05:52:00.501542] I [socket.c:4350:ssl_setup_connection_params] 0-management: SSL support for glusterd is ENABLED
[2020-04-22 05:52:00.501569] I [socket.c:4360:ssl_setup_connection_params] 0-management: using certificate depth 1
[2020-04-22 05:52:00.983720] I [MSGID: 106004] [glusterd-handler.c:6204:__glusterd_peer_rpc_notify] 0-management: Peer <glusterDevVM1> (<e2263e4d-a307-45d5-9cec-e1791f7a45fb>), in state <Peer in Cluster>, has disconnected from glusterd.
[2020-04-22 05:52:00.983886] W [glusterd-locks.c:796:glusterd_mgmt_v3_unlock] (-->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x22119) [0x7f303b5e3119] -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0x2caae) [0x7f303b5edaae] -->/usr/lib/x86_64-linux-gnu/glusterfs/7.3/xlator/mgmt/glusterd.so(+0xdf0d3) [0x7f303b6a00d3] ) 0-management: Lock for vol <vol> not held
[2020-04-22 05:52:00.983909] W [MSGID: 106117] [glusterd-handler.c:6225:__glusterd_peer_rpc_notify] 0-management: Lock not released for <vol>
2 last lines repeated for each volume
root@glusterDevVM3:~# egrep -v '0-socket.management' /var/log/glusterfs/glusterd.log
[2020-04-22 05:38:33.229959] I [MSGID: 106499] [glusterd-handler.c:4264:__glusterd_handle_status_volume] 0-management: Received status volume req for volume tmp
[2020-04-22 05:41:33.230170] I [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: unlock timer is cancelled for volume_type tmp_vol
[2020-04-22 05:48:34.908289] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x108, unique = 918, sent = 2020-04-22 05:38:33.230268, timeout = 600 for 10.5.1.7:24007
[2020-04-22 05:48:34.908339] E [MSGID: 106115] [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking failed on glusterDevVM1. Please check log file for details.
[2020-04-22 05:48:40.288539] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x27, unique = 917, sent = 2020-04-22 05:38:33.230258, timeout = 600 for 10.5.1.8:24007
[2020-04-22 05:48:40.288568] E [MSGID: 106115] [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking failed on glusterDevVM2. Please check log file for details.
[2020-04-22 05:48:40.288631] E [MSGID: 106150] [glusterd-syncop.c:1918:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2020-04-22 05:38:33.229959] I [MSGID: 106499] [glusterd-handler.c:4264:__glusterd_handle_status_volume] 0-management: Received status volume req for volume tmp
[2020-04-22 05:41:33.230170] I [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: unlock timer is cancelled for volume_type tmp_vol
[2020-04-22 05:48:34.908289] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x108, unique = 918, sent = 2020-04-22 05:38:33.230268, timeout = 600 for 10.5.1.7:24007
[2020-04-22 05:48:34.908339] E [MSGID: 106115] [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking failed on glusterDevVM1. Please check log file for details.
[2020-04-22 05:48:40.288539] E [rpc-clnt.c:183:call_bail] 0-management: bailing out frame type(glusterd mgmt v3), op(--(1)), xid = 0x27, unique = 917, sent = 2020-04-22 05:38:33.230258, timeout = 600 for 10.5.1.8:24007
[2020-04-22 05:48:40.288568] E [MSGID: 106115] [glusterd-mgmt.c:117:gd_mgmt_v3_collate_errors] 0-management: Locking failed on glusterDevVM2. Please check log file for details.
[2020-04-22 05:48:40.288631] E [MSGID: 106150] [glusterd-syncop.c:1918:gd_sync_task_begin] 0-management: Locking Peers Failed.
I'm not familiar with pstack, when running on node 3 (arbiter) I get only these few lines :
root@glusterDevVM3:~# pstack 13700
13700: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
(No symbols found)
0x7fafd747a6cd: ????
13700: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
(No symbols found)
0x7fafd747a6cd: ????
Which debian stretch package should I install ?
To be more explicit, I stopped glusterd on all 3 nodes then restart sequentially with this order : node1, node3 (arbiter) then node2.
Log files can be dl at https://www.dropbox.com/s/rcgcw7jrud2wkj1/glusterd-logs.tar.bz2?dl=0
Thanks for your help.
De: "Sanju Rakonde" <srakonde@xxxxxxxxxx>
À: nico@xxxxxxxxxx
Cc: "gluster-users" <gluster-users@xxxxxxxxxxx>
Envoyé: Mercredi 22 Avril 2020 07:23:52
Objet: Re: never ending logging
À: nico@xxxxxxxxxx
Cc: "gluster-users" <gluster-users@xxxxxxxxxxx>
Envoyé: Mercredi 22 Avril 2020 07:23:52
Objet: Re: never ending logging
Hi,
The email is talking about many issues. Let me ask a few questions to get a whole picture.
1. are the peers are in the connected state now? or they still in the rejected state?
2. What led you to see "locking failed" messages? We would like to if there is a reproducer and fix the issue if any.
3. Another transaction in progress message appears when there is already a operation going on. Are you seeing this when there is no such transaction going on?
4. When did you hit the timedouts? Did you tried to look at the pstack output of glusterd process? If so, please share the pstack output.
On Tue, Apr 21, 2020 at 7:08 PM <nico@xxxxxxxxxx> wrote:
Hi all.
We're using 3 nodes Gluster 7.3 (2 + 1 arbiter), yesterday node 2 was rejected from cluster and I applied following steps to fix : https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Administrator%20Guide/Resolving%20Peer%20Rejected/
I also saw https://docs.gluster.org/en/latest/Troubleshooting/troubleshooting-glusterd/ but solution isn't compatible as cluster.max-op-version doesn't exist and all op-version are the same on all 3 nodes.
After renewing SSL certs and several restart all volumes came back online but glusterd log file on all 3 nodes is filled with nothing else than following 3 lines :
[2020-04-21 13:05:19.478913] I [socket.c:4347:ssl_setup_connection_params] 0-socket.management: SSL support on the I/O path is ENABLED
[2020-04-21 13:05:19.478972] I [socket.c:4350:ssl_setup_connection_params] 0-socket.management: SSL support for glusterd is ENABLED
[2020-04-21 13:05:19.478986] I [socket.c:4360:ssl_setup_connection_params] 0-socket.management: using certificate depth 1
Moreover, I have "Locking failed", "Another transaction is in progress" and "Error : Request timed out" on gluster volume status volxxx command.
All SSL certs on clients have also been renewed and all volumes were remounted. All 3 nodes were alternatively restarted (glusterd) and rebooted.
The cluster is not in production environment but there's about ~250 clients for ~75 volumes, I don't know how to troubleshoot and fix this problem, if anyone has an idea.
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Thanks,
Sanju________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users