Even if I'm seeing disconnected nodes (also from already-in-pool nodes), my volume is still intact and available. So I'm guessing that glusterd has few to do with volume/brick service?
Am I safe to kill all glusterd on all servers and start this whole peer probing process all over again?
If I do this, will the currently mounted volumes become unavailable?
2015-08-31 17:47 GMT+08:00 Yiping Peng <barius.cn@xxxxxxxxx>:
The "Disconnected" state of nodes randomly changes, so I randomly picked a node and tailed last several lines of /var/log/glusterfs/etc-glusterfs-glusterd.vol.log (is it the right log file?).I can still access the cluster from servers already in pool, either reading or writing is fine.The log shows a log of "Failed to set keep-alive: Protocol not available":Thanks.[2015-08-31 09:38:25.586073] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:27.193523] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 8ed2d6cf-9758-4adf-8ed2-2d87f76491cf[2015-08-31 09:38:27.209085] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:27.370367] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-management: server 10.88.153.23:24007 has not responded in the last 30 seconds, disconnecting.[2015-08-31 09:38:28.803311] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 05885701-9a7c-4d2a-b18a-b5d9de2ccd57[2015-08-31 09:38:28.818834] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as FriendThe message "I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f7de5463-080d-4547-9601-0e9541dea928" repeated 4 times between [2015-08-31 09:36:30.776194] and [2015-08-31 09:38:06.162677]The message "I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 62eb172c-58ac-47c8-931e-05e5ad5a3133" repeated 4 times between [2015-08-31 09:36:32.404743] and [2015-08-31 09:38:07.779594][2015-08-31 09:38:30.419141] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <server62.yq01.local.net> (<3d354922-4bcd-4469-9e2e-559067882217>), in state <Peer in Cluster>, has disconnected from glusterd.[2015-08-31 09:38:30.419188] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <server52.yq01.local.net> (<6466759d-05eb-406e-9ede-a36dbf26c216>), in state <Peer in Cluster>, has disconnected from glusterd.[2015-08-31 09:38:30.419299] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 62eb172c-58ac-47c8-931e-05e5ad5a3133[2015-08-31 09:38:30.434835] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:32.035177] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 4db788d9-d372-4f57-a0f4-ba11d480013d[2015-08-31 09:38:33.373803] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 69, Protocol not available[2015-08-31 09:38:33.373821] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:33.376719] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 70, Protocol not available[2015-08-31 09:38:33.376735] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:32.050834] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:33.651240] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 9a291ec2-8f75-47fa-b4f4-c3edc02e9ce8[2015-08-31 09:38:33.666825] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:35.267184] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <server75.yq01.local.net> (<aeb43c67-1dd3-45e9-abbf-cc0037472724>), in state <Peer in Cluster>, has disconnected from glusterd.[2015-08-31 09:38:35.267237] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/7abc6dc0317b0f84408f0bc69917073c.socket failed (Invalid argument)[2015-08-31 09:38:35.267253] I [MSGID: 106006] [glusterd-svc-mgmt.c:319:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd.[2015-08-31 09:38:35.267352] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: df2686ca-e020-4593-97d8-bd50de4b2775[2015-08-31 09:38:35.282829] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:36.877526] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-08-31 09:37:43.506542 (xid=0x1535)[2015-08-31 09:38:36.877553] E [MSGID: 106167] [glusterd-handshake.c:2078:__glusterd_peer_dump_version_cbk] 0-management: Error through RPC layer, retry again later[2015-08-31 09:38:36.877643] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-31 09:37:43.506554 (xid=0x1536)[2015-08-31 09:38:36.877659] W [rpc-clnt-ping.c:204:rpc_clnt_ping_cbk] 0-management: socket disconnected[2015-08-31 09:38:36.877676] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <server6.yq01.local.net> (<eb491a24-3edd-494a-90c0-b4280bd6995e>), in state <Peer in Cluster>, has disconnected from glusterd.[2015-08-31 09:38:36.877823] W [glusterd-locks.c:677:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x551)[0x7fb93316a111] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2f0)[0x7fb9330d0300] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fb9330b3a50] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7fb93d5809a3] ))))) 0-management: Lock for vol speech0 not held[2015-08-31 09:38:36.877840] W [MSGID: 106118] [glusterd-handler.c:5073:__glusterd_peer_rpc_notify] 0-management: Lock not released for speech0[2015-08-31 09:38:36.877889] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <server48.yq01.local.net> (<372c820d-003e-4885-870c-547ca17f6770>), in state <Peer in Cluster>, has disconnected from glusterd.[2015-08-31 09:38:36.878012] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: d903d2f1-458d-43ae-a057-3f4999d3123a[2015-08-31 09:38:36.893088] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:37.380052] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Protocol not available[2015-08-31 09:38:37.380071] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:38.492491] W [socket.c:642:__socket_rwv] 0-socket.management: writev on 10.88.155.28:65379 failed (Broken pipe)[2015-08-31 09:38:38.492510] I [socket.c:2409:socket_event_handler] 0-transport: disconnecting now[2015-08-31 09:38:38.492565] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 5, Protocol not available[2015-08-31 09:38:38.492576] W [socket.c:2673:socket_server_event_handler] 0-socket.management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:38.492669] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <worker09.yq01.local.net> (<c0f4eab2-9cdd-4ba8-a002-259456288fd3>), in state <Peer in Cluster>, has disconnected from glusterd.[2015-08-31 09:38:38.492715] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <server53.yq01.local.net> (<b1f15cce-36e4-4ef4-a22f-70bafb0bf8d3>), in state <Peer in Cluster>, has disconnected from glusterd.[2015-08-31 09:38:38.492786] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 96aa9f85-f979-42a8-ac0a-1136384fbc14[2015-08-31 09:38:38.508078] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:39.383260] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 27, Protocol not available[2015-08-31 09:38:39.383280] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:40.108404] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 72e2074f-921d-45d6-9601-deee653075a9[2015-08-31 09:38:40.124073] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:41.386485] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 23, Protocol not available[2015-08-31 09:38:41.386506] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:41.389473] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 30, Protocol not available[2015-08-31 09:38:41.389486] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:41.733507] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f1c1b3d9-326d-4730-b1b0-788690da2ce1[2015-08-31 09:38:41.749079] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:43.348570] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 455da276-9ef5-46ab-90f9-457a70432224[2015-08-31 09:38:43.364074] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:44.964456] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <server43.yq01.local.net> (<76cb46d9-5669-47db-b264-68b55d4c37f0>), in state <Peer in Cluster>, has disconnected from glusterd.[2015-08-31 09:38:44.964578] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 00d5caae-b647-4dae-8d3e-df1e7f08941f[2015-08-31 09:38:44.980073] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:45.392805] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 38, Protocol not available[2015-08-31 09:38:45.392825] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:46.393009] C [rpc-clnt-ping.c:161:rpc_clnt_ping_timer_expired] 0-management: server 10.88.155.15:24007 has not responded in the last 30 seconds, disconnecting.[2015-08-31 09:38:46.584515] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e204bc20-9c4f-449c-9dfc-f6e54b96bf8c[2015-08-31 09:38:46.600079] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:47.396000] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 35, Protocol not available[2015-08-31 09:38:47.396019] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:48.198525] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 607e3f7a-65e6-423a-9226-5f763f9838e8[2015-08-31 09:38:48.214089] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:49.815541] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: e2322b18-2e5f-4c3c-8cc2-84b137fa7328[2015-08-31 09:38:49.831078] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:51.434550] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] ))))) 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called at 2015-08-31 09:37:56.464514 (xid=0x1315)[2015-08-31 09:38:51.434579] E [MSGID: 106167] [glusterd-handshake.c:2078:__glusterd_peer_dump_version_cbk] 0-management: Error through RPC layer, retry again later[2015-08-31 09:38:51.434669] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fb93d5801b7] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb93d5802ce] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fb93d58039b] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fb93d58095f] ))))) 0-management: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2015-08-31 09:37:56.464526 (xid=0x1316)[2015-08-31 09:38:51.434685] W [rpc-clnt-ping.c:204:rpc_clnt_ping_cbk] 0-management: socket disconnected[2015-08-31 09:38:51.434704] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <server42.yq01.local.net> (<0b24198f-dfad-4259-bc22-9f3736f53824>), in state <Peer in Cluster>, has disconnected from glusterd.[2015-08-31 09:38:51.434850] W [glusterd-locks.c:677:glusterd_mgmt_v3_unlock] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fb93d7b465b] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x551)[0x7fb93316a111] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x2f0)[0x7fb9330d0300] (--> /usr/lib64/glusterfs/3.7.3/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7fb9330b3a50] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3)[0x7fb93d5809a3] ))))) 0-management: Lock for vol speech0 not held[2015-08-31 09:38:51.434867] W [MSGID: 106118] [glusterd-handler.c:5073:__glusterd_peer_rpc_notify] 0-management: Lock not released for speech0[2015-08-31 09:38:51.434994] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 8ed2d6cf-9758-4adf-8ed2-2d87f76491cf[2015-08-31 09:38:51.450075] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:53.049543] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: f7de5463-080d-4547-9601-0e9541dea928[2015-08-31 09:38:53.065083] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:54.666534] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 05885701-9a7c-4d2a-b18a-b5d9de2ccd57[2015-08-31 09:38:54.682066] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:57.399884] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 45, Protocol not available[2015-08-31 09:38:57.399906] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:57.402816] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 69, Protocol not available[2015-08-31 09:38:57.402830] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available[2015-08-31 09:38:56.301076] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:57.897551] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 9a291ec2-8f75-47fa-b4f4-c3edc02e9ce8[2015-08-31 09:38:57.913072] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:38:59.513520] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: df2686ca-e020-4593-97d8-bd50de4b2775[2015-08-31 09:38:59.529073] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:39:01.129419] I [MSGID: 106004] [glusterd-handler.c:5051:__glusterd_peer_rpc_notify] 0-management: Peer <server75.yq01.local.net> (<aeb43c67-1dd3-45e9-abbf-cc0037472724>), in state <Peer in Cluster>, has disconnected from glusterd.[2015-08-31 09:39:01.129469] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/7abc6dc0317b0f84408f0bc69917073c.socket failed (Invalid argument)[2015-08-31 09:39:01.129484] I [MSGID: 106006] [glusterd-svc-mgmt.c:319:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd.[2015-08-31 09:39:01.129587] I [MSGID: 106492] [glusterd-handler.c:2706:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: d903d2f1-458d-43ae-a057-3f4999d3123a[2015-08-31 09:39:01.145074] I [MSGID: 106502] [glusterd-handler.c:2751:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend[2015-08-31 09:39:01.406146] W [socket.c:923:__socket_keepalive] 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 12, Protocol not available[2015-08-31 09:39:01.406168] E [socket.c:3019:socket_connect] 0-management: Failed to set keep-alive: Protocol not available2015-08-31 16:54 GMT+08:00 Atin Mukherjee <amukherj@xxxxxxxxxx>:
On 08/31/2015 01:10 PM, Yiping Peng wrote:
> Hi guys,
>
>
> I've been running GlusterFS for a couple of days and it's been nice and
> steady, except a minor problem: the peer probing on my relatively large
> cluster seems to stuck for a long time.
>
>
> Last time atinm told me in IRC (I was barius.2333 in IRC) that a cluster as
> large as 50+ nodes might take a long time peer probing (o(n^2) time), and
> now my cluster has expanded to 90+ nodes.
>
>
> The peer probing process was started 4 days ago, when my cluster had ~50
> nodes. I probed ~40 nodes using subprocess in bash at once, and the
> commands all successfully returned almost immediately (no time-outs).
>
>
> However the glusterd kept writing to /var/lib/glusterd/peers/ during the
> last 4 days, and all commands related to newly-added nodes, e.g. add-brick,
> mount, will time-out and fail. Also, running “gluster peer status” on my
> nodes shows “Disconnected” nodes that varies over time.
Peer status should not shows node in disconnected state even if the peer
handshaking takes longer time, if it does then something is wrong. Could
you check which node is disconnected and what the glusterd log file on
that node indicates?
>
>
> What shall I do in such situation? Do I need to wait for the whole peer
> probing progress to complete, or can I simply kill the glusterd and restart
> it?
>
>
> Regards,
>
> Yiping Peng
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
>
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users