Can not add peer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

I have a ~50 node cluster. I configured gluster so that there are 2 volumes: One is configured on top of HDD, and the other one is configured on top of RAM.

 

[root@nmIDPP20 ~]# gluster volume info

Volume Name: ram

Type: Distributed-Replicate

Volume ID: a97fa262-276b-41e9-8f59-40f28451f689

Status: Started

Number of Bricks: 5 x 2 = 10

Transport-type: tcp

Bricks:

Brick1: 10.238.0.15:/mnt/ram/data

Brick2: 10.238.0.16:/mnt/ram/data

Brick3: 10.238.0.17:/mnt/ram/data

Brick4: 10.238.0.20:/mnt/ram/data

Brick5: 10.238.0.19:/mnt/ram/data

Brick6: 10.238.0.28:/mnt/ram/data

Brick7: 10.238.0.27:/mnt/ram/data

Brick8: 10.238.0.21:/mnt/ram/data

Brick9: 10.238.0.24:/mnt/ram/data

Brick10: 10.238.0.26:/mnt/ram/data

Volume Name: disk

Type: Replicate

Volume ID: 9607ae5f-0dbf-4164-b260-5d9ce26d4fc7

Status: Started

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: 10.238.0.18:/var/cache/gluster/data/options/pp/data

Brick2: 10.238.0.16:/var/cache/gluster/data/options/pp/data

Brick3: 10.238.0.17:/var/cache/gluster/data/options/pp/data

 

 

I’ve bare metaled one of the servers: 10.238.0.22. And, now trying to add it to the pool. So, after gluster peer probe 10.238.0.22 command, we can see that it’s in pool:

 

[root@nmIDPP20 ~]# gluster pool list

UUID                                                                 Hostname           State

baa648a5-ff35-44e0-80ea-a55e43154d12              10.238.0.50        Connected

20bb470a-85da-4e3a-a66b-08a935c189ae            10.238.0.26        Connected

79dffcf8-8c3a-47b5-926a-39be2c1406da               10.238.0.13        Disconnected

7212e375-76a4-46c9-8bac-7470e2e5a910             10.238.0.17        Connected

c6080a14-33d7-4012-8940-2d9232752551            10.238.0.14        Connected

b553ed3c-21f1-4110-808d-4b08e6ded200             10.238.0.28        Connected

5e596931-9151-4f5b-bc57-feb6fe46054f 10.238.0.7           Connected

8e1128ed-df07-4747-812e-dcc280fce5c1               10.238.0.16        Connected

0b5fae30-e169-42ee-8f39-678d6fc93ac2               10.238.0.19        Connected

0f82df55-3994-4561-8a0a-1c1d2e9c3cff 10.238.0.29        Connected

446ea1e4-61b9-4881-9073-6aeb9a154710            10.238.0.24        Connected

bcf84149-415b-4eb7-8dc1-2b284e135307             10.238.0.27        Connected

97dddf9f-0b57-4bb8-86fd-196cb51df4b6 10.238.0.20        Connected

b2bf8b3c-890b-423b-b901-f16f1186c3e6               10.238.0.4           Connected

878ba732-0fea-4734-b1bc-a08ad7a2c97a             10.238.0.9           Connected

51750fb0-c182-4e76-821f-16cee23fdf27 10.238.0.6           Connected

b162e108-4301-47df-875f-92151244b694              10.238.0.8           Connected

25d29db8-0916-4ef4-80d1-34fbf8aa5d26              10.238.0.21        Connected

9acfb879-7df9-4c87-aa1c-eb518b9c668d               10.238.0.12        Connected

aacd1fa1-940c-4cec-9b04-1fb49348e764               10.238.0.49        Connected

5c36b282-9842-4b85-8d0f-e5101817dfe1              10.238.0.18        Connected

a5298a13-144d-46e1-856f-91ade6649840             10.238.0.10        Connected

4e7b83bd-367e-419d-aa5b-34947021dbc3            10.238.0.48        Connected

6aa7957f-be6f-4bee-a748-32937d3ababd              10.238.0.47        Connected

3890ac7d-7959-4565-86de-fc792cc357b0              10.238.0.45        Disconnected

4814a743-5b52-44ab-b169-e907082aa229            10.238.0.32        Connected

cf735cd8-75e3-413b-88c5-46e5b79f7558              10.238.0.42        Connected

b1fa7e22-2e1b-4d07-966e-3096e58e5c78             10.238.0.39        Connected

1459fce8-110c-478f-815e-89507225226e              10.238.0.34        Connected

a7b21ee9-970b-4d99-9f8f-b7e1cbf4be77               10.238.0.25        Connected

dab1a271-4244-41bc-b770-7b13bd6e399d            10.238.0.43        Connected

5b483c65-0d04-4188-85a9-77dfbbef78cd              10.238.0.41        Connected

1b8cb9d8-ce8f-49aa-b958-705dd09db073             10.238.0.40        Connected

4b4f85a0-1310-45df-a613-e33c967cc53d              10.238.0.38        Connected

dab043b8-11ba-4fa6-9b82-baa18b41167d             10.238.0.33        Disconnected

06cbc4c2-9d79-4689-9ac6-3dbc2250d903             10.238.0.30        Connected

f33451c7-e984-495c-8e34-0b2d99a21e1e             10.238.0.31        Connected

1873e2ce-1239-4b6d-930f-af14e9c1f13b               10.238.0.5           Connected

c85de12f-23e6-4797-adb4-d33b7b4eb5fc              10.238.0.11        Connected

4147639d-652e-49a8-aa8b-d77327cca9ca             10.238.0.15        Connected

07580a32-c558-449d-b454-044fb679c908             10.238.0.22        Connected

d5140e78-498d-4c63-868d-189554aef7d4             localhost             Connected

 

 

But, gluster peer status is giving following output:

 

[root@nmIDPP20 ~]# gluster peer status

Number of Peers: 41

 

Hostname: 10.238.0.50

Uuid: baa648a5-ff35-44e0-80ea-a55e43154d12

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.26

Uuid: 20bb470a-85da-4e3a-a66b-08a935c189ae

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.13

Uuid: 79dffcf8-8c3a-47b5-926a-39be2c1406da

State: Peer in Cluster (Disconnected)

 

Hostname: 10.238.0.17

Uuid: 7212e375-76a4-46c9-8bac-7470e2e5a910

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.14

Uuid: c6080a14-33d7-4012-8940-2d9232752551

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.28

Uuid: b553ed3c-21f1-4110-808d-4b08e6ded200

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.7

Uuid: 5e596931-9151-4f5b-bc57-feb6fe46054f

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.16

Uuid: 8e1128ed-df07-4747-812e-dcc280fce5c1

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.19

Uuid: 0b5fae30-e169-42ee-8f39-678d6fc93ac2

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.29

Uuid: 0f82df55-3994-4561-8a0a-1c1d2e9c3cff

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.24

Uuid: 446ea1e4-61b9-4881-9073-6aeb9a154710

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.27

Uuid: bcf84149-415b-4eb7-8dc1-2b284e135307

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.20

Uuid: 97dddf9f-0b57-4bb8-86fd-196cb51df4b6

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.4

Uuid: b2bf8b3c-890b-423b-b901-f16f1186c3e6

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.9

Uuid: 878ba732-0fea-4734-b1bc-a08ad7a2c97a

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.6

Uuid: 51750fb0-c182-4e76-821f-16cee23fdf27

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.8

Uuid: b162e108-4301-47df-875f-92151244b694

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.21

Uuid: 25d29db8-0916-4ef4-80d1-34fbf8aa5d26

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.12

Uuid: 9acfb879-7df9-4c87-aa1c-eb518b9c668d

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.49

Uuid: aacd1fa1-940c-4cec-9b04-1fb49348e764

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.18

Uuid: 5c36b282-9842-4b85-8d0f-e5101817dfe1

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.10

Uuid: a5298a13-144d-46e1-856f-91ade6649840

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.48

Uuid: 4e7b83bd-367e-419d-aa5b-34947021dbc3

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.47

Uuid: 6aa7957f-be6f-4bee-a748-32937d3ababd

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.45

Uuid: 3890ac7d-7959-4565-86de-fc792cc357b0

State: Peer in Cluster (Disconnected)

 

Hostname: 10.238.0.32

Uuid: 4814a743-5b52-44ab-b169-e907082aa229

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.42

Uuid: cf735cd8-75e3-413b-88c5-46e5b79f7558

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.39

Uuid: b1fa7e22-2e1b-4d07-966e-3096e58e5c78

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.34

Uuid: 1459fce8-110c-478f-815e-89507225226e

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.25

Uuid: a7b21ee9-970b-4d99-9f8f-b7e1cbf4be77

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.43

Uuid: dab1a271-4244-41bc-b770-7b13bd6e399d

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.41

Uuid: 5b483c65-0d04-4188-85a9-77dfbbef78cd

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.40

Uuid: 1b8cb9d8-ce8f-49aa-b958-705dd09db073

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.38

Uuid: 4b4f85a0-1310-45df-a613-e33c967cc53d

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.33

Uuid: dab043b8-11ba-4fa6-9b82-baa18b41167d

State: Peer in Cluster (Disconnected)

 

Hostname: 10.238.0.30

Uuid: 06cbc4c2-9d79-4689-9ac6-3dbc2250d903

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.31

Uuid: f33451c7-e984-495c-8e34-0b2d99a21e1e

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.5

Uuid: 1873e2ce-1239-4b6d-930f-af14e9c1f13b

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.11

Uuid: c85de12f-23e6-4797-adb4-d33b7b4eb5fc

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.15

Uuid: 4147639d-652e-49a8-aa8b-d77327cca9ca

State: Peer in Cluster (Connected)

 

Hostname: 10.238.0.22

Uuid: 07580a32-c558-449d-b454-044fb679c908

State: Probe Sent to Peer (Connected)

 

 

And after staying in this state for about 10 min, .22 node disappears from pool list. Also, during peer probe, on node .22 if you do gluster pool list , it hangs and does not do anything. Only, after few mins it releases the shell, and outputs nothing.

 

I’ve tried to do couple of things to resolve the issue:

1.      Disabled firewall -> didn’t help

2.      Removed mgmt directory from 22, restarted gluster service and glusterfs/d processes  -> didn’t help

3.      Tried to probe .22 from another server -> didn’t help

4.      Reset uuid of .22 -> didn’t help

 

I don’t know what I can do more, so asking for support from you.

 

 

Following are logs during probe from .22 and .23:

 

10.238.0.22:/

 

[2016-05-06 19:45:24.463346] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req

[2016-05-06 19:46:01.295054] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req

[2016-05-06 19:46:50.518018] I [glusterd-handshake.c:563:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30501

[2016-05-06 19:46:50.521829] I [glusterd-handler.c:2346:__glusterd_handle_probe_query] 0-glusterd: Received probe from uuid: d5140e78-498d-4c63-868d-189554aef7d4

[2016-05-06 19:47:10.542419] I [glusterd-handler.c:2374:__glusterd_handle_probe_query] 0-glusterd: Unable to find peerinfo for host: 10.238.0.23 (24007)

[2016-05-06 19:47:10.548116] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600

[2016-05-06 19:47:10.548218] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled

[2016-05-06 19:47:10.548239] I [socket.c:3576:socket_init] 0-management: using system polling thread

[2016-05-06 19:47:10.553769] I [glusterd-handler.c:2912:glusterd_friend_add] 0-management: connect returned 0

[2016-05-06 19:47:10.553886] I [glusterd-handler.c:2398:__glusterd_handle_probe_query] 0-glusterd: Responded to 10.238.0.23, op_ret: 0, op_errno: 0, ret: 0

[2016-05-06 19:47:10.554650] I [glusterd-handler.c:2050:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: d5140e78-498d-4c63-868d-189554aef7d4

[2016-05-06 19:50:50.812036] E [glusterd-utils.c:4692:glusterd_brick_start] 0-management: Could not find peer on which brick 10.238.0.15:/mnt/ram/data resides

 

 

10.238.0.23:

 

[2016-05-06 19:46:28.982091] I [glusterd-handler.c:1114:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req

[2016-05-06 19:46:31.930017] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:46:34.934960] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:46:37.916015] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:46:40.947036] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:46:43.950373] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:46:46.961104] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:46:49.966875] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:46:50.497510] I [glusterd-handler.c:918:__glusterd_handle_cli_probe] 0-glusterd: Received CLI probe req 10.238.0.22 24007

[2016-05-06 19:46:50.502555] I [glusterd-handler.c:2931:glusterd_probe_begin] 0-glusterd: Unable to find peerinfo for host: 10.238.0.22 (24007)

[2016-05-06 19:46:50.511183] I [rpc-clnt.c:972:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600

[2016-05-06 19:46:50.511279] I [socket.c:3561:socket_init] 0-management: SSL support is NOT enabled

[2016-05-06 19:46:50.511300] I [socket.c:3576:socket_init] 0-management: using system polling thread

[2016-05-06 19:46:50.517005] I [glusterd-handler.c:2912:glusterd_friend_add] 0-management: connect returned 0

[2016-05-06 19:46:52.983838] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:46:55.975533] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:46:58.989536] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:47:01.994423] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:47:04.995025] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:47:07.995849] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:47:10.553738] I [glusterd-rpc-ops.c:234:__glusterd_probe_cbk] 0-glusterd: Received probe resp from uuid: 07580a32-c558-449d-b454-044fb679c908, host: 10.238.0.22

[2016-05-06 19:47:10.559641] I [glusterd-rpc-ops.c:306:__glusterd_probe_cbk] 0-glusterd: Received resp to probe req

[2016-05-06 19:47:11.006166] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:47:14.009705] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:47:16.996479] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:47:20.024705] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:47:23.035546] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

[2016-05-06 19:47:26.041132] E [glusterd-handshake.c:942:__glusterd_mgmt_hndsk_version_ack_cbk] 0-management: Failed to get handshake ack from remote server

 

 

Please, advice anything to handle this issue.

 

Thanks,

Azamat

Phone: 703-667-8922

 


_____________________________________________________
This electronic message and any files transmitted with it contains
information from iDirect, which may be privileged, proprietary
and/or confidential. It is intended solely for the use of the individual
or entity to whom they are addressed. If you are not the original
recipient or the person responsible for delivering the email to the
intended recipient, be advised that you have received this email
in error, and that any use, dissemination, forwarding, printing, or
copying of this email is strictly prohibited. If you received this email
in error, please delete it and immediately notify the sender.
_____________________________________________________
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux