On 05/07/2016 01:39 AM, Muminov, Azamat wrote: > Hi, > > > > I have a ~50 node cluster. I configured gluster so that there are 2 > volumes: One is configured on top of HDD, and the other one is > configured on top of RAM. > > > > [root@nmIDPP20 ~]# gluster volume info > > Volume Name: ram > > Type: Distributed-Replicate > > Volume ID: a97fa262-276b-41e9-8f59-40f28451f689 > > Status: Started > > Number of Bricks: 5 x 2 = 10 > > Transport-type: tcp > > Bricks: > > Brick1: 10.238.0.15:/mnt/ram/data > > Brick2: 10.238.0.16:/mnt/ram/data > > Brick3: 10.238.0.17:/mnt/ram/data > > Brick4: 10.238.0.20:/mnt/ram/data > > Brick5: 10.238.0.19:/mnt/ram/data > > Brick6: 10.238.0.28:/mnt/ram/data > > Brick7: 10.238.0.27:/mnt/ram/data > > Brick8: 10.238.0.21:/mnt/ram/data > > Brick9: 10.238.0.24:/mnt/ram/data > > Brick10: 10.238.0.26:/mnt/ram/data > > Volume Name: disk > > Type: Replicate > > Volume ID: 9607ae5f-0dbf-4164-b260-5d9ce26d4fc7 > > Status: Started > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > Brick1: 10.238.0.18:/var/cache/gluster/data/options/pp/data > > Brick2: 10.238.0.16:/var/cache/gluster/data/options/pp/data > > Brick3: 10.238.0.17:/var/cache/gluster/data/options/pp/data > > > > > > *I’ve bare metaled one of the servers: 10.238.0.22. And, now trying to > add it to the pool. So, after _gluster peer probe 10.238.0.22_ command, > we can see that it’s in pool:* > > > > [root@nmIDPP20 ~]# gluster pool list > > UUID > Hostname State > > baa648a5-ff35-44e0-80ea-a55e43154d12 10.238.0.50 > Connected > > 20bb470a-85da-4e3a-a66b-08a935c189ae 10.238.0.26 > Connected > > 79dffcf8-8c3a-47b5-926a-39be2c1406da 10.238.0.13 > Disconnected > > 7212e375-76a4-46c9-8bac-7470e2e5a910 10.238.0.17 > Connected > > c6080a14-33d7-4012-8940-2d9232752551 10.238.0.14 > Connected > > b553ed3c-21f1-4110-808d-4b08e6ded200 10.238.0.28 > Connected > > 5e596931-9151-4f5b-bc57-feb6fe46054f 10.238.0.7 Connected > > 8e1128ed-df07-4747-812e-dcc280fce5c1 10.238.0.16 > Connected > > 0b5fae30-e169-42ee-8f39-678d6fc93ac2 10.238.0.19 > Connected > > 0f82df55-3994-4561-8a0a-1c1d2e9c3cff 10.238.0.29 Connected > > 446ea1e4-61b9-4881-9073-6aeb9a154710 10.238.0.24 > Connected > > bcf84149-415b-4eb7-8dc1-2b284e135307 10.238.0.27 > Connected > > 97dddf9f-0b57-4bb8-86fd-196cb51df4b6 10.238.0.20 Connected > > b2bf8b3c-890b-423b-b901-f16f1186c3e6 10.238.0.4 > Connected > > 878ba732-0fea-4734-b1bc-a08ad7a2c97a 10.238.0.9 > Connected > > 51750fb0-c182-4e76-821f-16cee23fdf27 10.238.0.6 Connected > > b162e108-4301-47df-875f-92151244b694 10.238.0.8 > Connected > > 25d29db8-0916-4ef4-80d1-34fbf8aa5d26 10.238.0.21 > Connected > > 9acfb879-7df9-4c87-aa1c-eb518b9c668d 10.238.0.12 > Connected > > aacd1fa1-940c-4cec-9b04-1fb49348e764 10.238.0.49 > Connected > > 5c36b282-9842-4b85-8d0f-e5101817dfe1 10.238.0.18 > Connected > > a5298a13-144d-46e1-856f-91ade6649840 10.238.0.10 > Connected > > 4e7b83bd-367e-419d-aa5b-34947021dbc3 10.238.0.48 > Connected > > 6aa7957f-be6f-4bee-a748-32937d3ababd 10.238.0.47 > Connected > > 3890ac7d-7959-4565-86de-fc792cc357b0 10.238.0.45 > Disconnected > > 4814a743-5b52-44ab-b169-e907082aa229 10.238.0.32 > Connected > > cf735cd8-75e3-413b-88c5-46e5b79f7558 10.238.0.42 > Connected > > b1fa7e22-2e1b-4d07-966e-3096e58e5c78 10.238.0.39 > Connected > > 1459fce8-110c-478f-815e-89507225226e 10.238.0.34 > Connected > > a7b21ee9-970b-4d99-9f8f-b7e1cbf4be77 10.238.0.25 > Connected > > dab1a271-4244-41bc-b770-7b13bd6e399d 10.238.0.43 > Connected > > 5b483c65-0d04-4188-85a9-77dfbbef78cd 10.238.0.41 > Connected > > 1b8cb9d8-ce8f-49aa-b958-705dd09db073 10.238.0.40 > Connected > > 4b4f85a0-1310-45df-a613-e33c967cc53d 10.238.0.38 > Connected > > dab043b8-11ba-4fa6-9b82-baa18b41167d 10.238.0.33 > Disconnected > > 06cbc4c2-9d79-4689-9ac6-3dbc2250d903 10.238.0.30 > Connected > > f33451c7-e984-495c-8e34-0b2d99a21e1e 10.238.0.31 > Connected > > 1873e2ce-1239-4b6d-930f-af14e9c1f13b 10.238.0.5 > Connected > > c85de12f-23e6-4797-adb4-d33b7b4eb5fc 10.238.0.11 > Connected > > 4147639d-652e-49a8-aa8b-d77327cca9ca 10.238.0.15 > Connected > > 07580a32-c558-449d-b454-044fb679c908 10.238.0.22 > Connected > > d5140e78-498d-4c63-868d-189554aef7d4 localhost > Connected > > > > > > But, _gluster peer status_ is giving following output: > > > > [root@nmIDPP20 ~]# gluster peer status > > Number of Peers: 41 > > > > Hostname: 10.238.0.50 > > Uuid: baa648a5-ff35-44e0-80ea-a55e43154d12 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.26 > > Uuid: 20bb470a-85da-4e3a-a66b-08a935c189ae > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.13 > > Uuid: 79dffcf8-8c3a-47b5-926a-39be2c1406da > > State: Peer in Cluster (Disconnected) > > > > Hostname: 10.238.0.17 > > Uuid: 7212e375-76a4-46c9-8bac-7470e2e5a910 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.14 > > Uuid: c6080a14-33d7-4012-8940-2d9232752551 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.28 > > Uuid: b553ed3c-21f1-4110-808d-4b08e6ded200 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.7 > > Uuid: 5e596931-9151-4f5b-bc57-feb6fe46054f > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.16 > > Uuid: 8e1128ed-df07-4747-812e-dcc280fce5c1 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.19 > > Uuid: 0b5fae30-e169-42ee-8f39-678d6fc93ac2 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.29 > > Uuid: 0f82df55-3994-4561-8a0a-1c1d2e9c3cff > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.24 > > Uuid: 446ea1e4-61b9-4881-9073-6aeb9a154710 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.27 > > Uuid: bcf84149-415b-4eb7-8dc1-2b284e135307 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.20 > > Uuid: 97dddf9f-0b57-4bb8-86fd-196cb51df4b6 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.4 > > Uuid: b2bf8b3c-890b-423b-b901-f16f1186c3e6 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.9 > > Uuid: 878ba732-0fea-4734-b1bc-a08ad7a2c97a > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.6 > > Uuid: 51750fb0-c182-4e76-821f-16cee23fdf27 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.8 > > Uuid: b162e108-4301-47df-875f-92151244b694 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.21 > > Uuid: 25d29db8-0916-4ef4-80d1-34fbf8aa5d26 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.12 > > Uuid: 9acfb879-7df9-4c87-aa1c-eb518b9c668d > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.49 > > Uuid: aacd1fa1-940c-4cec-9b04-1fb49348e764 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.18 > > Uuid: 5c36b282-9842-4b85-8d0f-e5101817dfe1 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.10 > > Uuid: a5298a13-144d-46e1-856f-91ade6649840 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.48 > > Uuid: 4e7b83bd-367e-419d-aa5b-34947021dbc3 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.47 > > Uuid: 6aa7957f-be6f-4bee-a748-32937d3ababd > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.45 > > Uuid: 3890ac7d-7959-4565-86de-fc792cc357b0 > > State: Peer in Cluster (Disconnected) > > > > Hostname: 10.238.0.32 > > Uuid: 4814a743-5b52-44ab-b169-e907082aa229 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.42 > > Uuid: cf735cd8-75e3-413b-88c5-46e5b79f7558 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.39 > > Uuid: b1fa7e22-2e1b-4d07-966e-3096e58e5c78 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.34 > > Uuid: 1459fce8-110c-478f-815e-89507225226e > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.25 > > Uuid: a7b21ee9-970b-4d99-9f8f-b7e1cbf4be77 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.43 > > Uuid: dab1a271-4244-41bc-b770-7b13bd6e399d > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.41 > > Uuid: 5b483c65-0d04-4188-85a9-77dfbbef78cd > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.40 > > Uuid: 1b8cb9d8-ce8f-49aa-b958-705dd09db073 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.38 > > Uuid: 4b4f85a0-1310-45df-a613-e33c967cc53d > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.33 > > Uuid: dab043b8-11ba-4fa6-9b82-baa18b41167d > > State: Peer in Cluster (Disconnected) > > > > Hostname: 10.238.0.30 > > Uuid: 06cbc4c2-9d79-4689-9ac6-3dbc2250d903 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.31 > > Uuid: f33451c7-e984-495c-8e34-0b2d99a21e1e > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.5 > > Uuid: 1873e2ce-1239-4b6d-930f-af14e9c1f13b > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.11 > > Uuid: c85de12f-23e6-4797-adb4-d33b7b4eb5fc > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.15 > > Uuid: 4147639d-652e-49a8-aa8b-d77327cca9ca > > State: Peer in Cluster (Connected) > > > > Hostname: 10.238.0.22 > > Uuid: 07580a32-c558-449d-b454-044fb679c908 > > State: Probe Sent to Peer (Connected) The state indicates that the handshaking is not completed yet. I suggest the following work around: Find 07580a32-c558-449d-b454-044fb679c908 file in /var/lib/glusterd/peers directory from all the nodes (except node 22). This file contains details of the peer which are uuid, state and hostname. Update state with 3 and then restart glusterd on all 49 nodes one by one. Please note the handshaking will take some time (~10 minutes). Post that you should be able to see this node back in the cluster. Let me know if this doesn't work. I'll be happy to assist you further. > > > > > > And after staying in this state for about 10 min, .22 node disappears > from pool list. Also, during peer probe, on node .22 if you do _gluster > pool list_ , it hangs and does not do anything. Only, after few mins it > releases the shell, and outputs nothing. > > > > I’ve tried to do couple of things to resolve the issue: > > 1. Disabled firewall -> didn’t help > > 2. Removed mgmt directory from 22, restarted gluster service and > glusterfs/d processes -> didn’t help > > 3. Tried to probe .22 from another server -> didn’t help > > 4. Reset uuid of .22 -> didn’t help > > > > I don’t know what I can do more, so asking for support from you. > > > > > > Following are logs during probe from .22 and .23: > > > > 10.238.0.22:/ > > > > [2016-05-06 19:45:24.463346] I > [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: > Received cli list req > > [2016-05-06 19:46:01.295054] I > [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: > Received cli list req > > [2016-05-06 19:46:50.518018] I > [glusterd-handshake.c:563:__glusterd_mgmt_hndsk_versions_ack] > 0-management: using the op-version 30501 > > [2016-05-06 19:46:50.521829] I > [glusterd-handler.c:2346:__glusterd_handle_probe_query] 0-glusterd: > Received probe from uuid: d5140e78-498d-4c63-868d-189554aef7d4 > > [2016-05-06 19:47:10.542419] I > [glusterd-handler.c:2374:__glusterd_handle_probe_query] 0-glusterd: > Unable to find peerinfo for host: 10.238.0.23 (24007) > > [2016-05-06 19:47:10.548116] I [rpc-clnt.c:972:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > > [2016-05-06 19:47:10.548218] I [socket.c:3561:socket_init] 0-management: > SSL support is NOT enabled > > [2016-05-06 19:47:10.548239] I [socket.c:3576:socket_init] 0-management: > using system polling thread > > [2016-05-06 19:47:10.553769] I > [glusterd-handler.c:2912:glusterd_friend_add] 0-management: connect > returned 0 > > [2016-05-06 19:47:10.553886] I > [glusterd-handler.c:2398:__glusterd_handle_probe_query] 0-glusterd: > Responded to 10.238.0.23, op_ret: 0, op_errno: 0, ret: 0 > > [2016-05-06 19:47:10.554650] I > [glusterd-handler.c:2050:__glusterd_handle_incoming_friend_req] > 0-glusterd: Received probe from uuid: d5140e78-498d-4c63-868d-189554aef7d4 > > [2016-05-06 19:50:50.812036] E > [glusterd-utils.c:4692:glusterd_brick_start] 0-management: Could not > find peer on which brick 10.238.0.15:/mnt/ram/data resides > > > > > > 10.238.0.23: > > > > [2016-05-06 19:46:28.982091] I > [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: > Received cli list req > > [2016-05-06 19:46:31.930017] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:46:34.934960] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:46:37.916015] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:46:40.947036] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:46:43.950373] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:46:46.961104] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:46:49.966875] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:46:50.497510] I > [glusterd-handler.c:918:__glusterd_handle_cli_probe] 0-glusterd: > Received CLI probe req 10.238.0.22 24007 > > [2016-05-06 19:46:50.502555] I > [glusterd-handler.c:2931:glusterd_probe_begin] 0-glusterd: Unable to > find peerinfo for host: 10.238.0.22 (24007) > > [2016-05-06 19:46:50.511183] I [rpc-clnt.c:972:rpc_clnt_connection_init] > 0-management: setting frame-timeout to 600 > > [2016-05-06 19:46:50.511279] I [socket.c:3561:socket_init] 0-management: > SSL support is NOT enabled > > [2016-05-06 19:46:50.511300] I [socket.c:3576:socket_init] 0-management: > using system polling thread > > [2016-05-06 19:46:50.517005] I > [glusterd-handler.c:2912:glusterd_friend_add] 0-management: connect > returned 0 > > [2016-05-06 19:46:52.983838] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:46:55.975533] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:46:58.989536] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:47:01.994423] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:47:04.995025] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:47:07.995849] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:47:10.553738] I > [glusterd-rpc-ops.c:234:__glusterd_probe_cbk] 0-glusterd: Received probe > resp from uuid: 07580a32-c558-449d-b454-044fb679c908, host: 10.238.0.22 > > [2016-05-06 19:47:10.559641] I > [glusterd-rpc-ops.c:306:__glusterd_probe_cbk] 0-glusterd: Received resp > to probe req > > [2016-05-06 19:47:11.006166] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:47:14.009705] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:47:16.996479] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:47:20.024705] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:47:23.035546] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > [2016-05-06 19:47:26.041132] E > [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] > 0-management: Failed to get handshake ack from remote server > > > > > > Please, advice anything to handle this issue. > > > > Thanks, > > Azamat > > Phone: 703-667-8922 > > > > > _____________________________________________________ > This electronic message and any files transmitted with it contains > information from iDirect, which may be privileged, proprietary > and/or confidential. It is intended solely for the use of the individual > or entity to whom they are addressed. If you are not the original > recipient or the person responsible for delivering the email to the > intended recipient, be advised that you have received this email > in error, and that any use, dissemination, forwarding, printing, or > copying of this email is strictly prohibited. If you received this email > in error, please delete it and immediately notify the sender. > _____________________________________________________ > > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users