Based on the error log, I'd guess at a DNS problems. Can your machines do DNS lookups and reverse lookups to each other (ie names resolve to the correct IP #s and vice versa)? Based on your hostnames, it looks like you're running on a ROCKS cluster so you might have competing (or incorrect) DNS info (cluster DNS vs institutional DNS vs /etc/hosts info). It shouldn't be the case in a cluster but firewalls can obviously be a problem. hjm On Thu, Aug 2, 2012 at 8:21 AM, Dan Bretherton <d.a.bretherton at reading.ac.uk> wrote: > Dear All- > My recent upgrade from 3.2.6 to 3.3.0 went well, but now I can't add new > peers to the cluster. I can create a new peer group of servers all with 3.3 > freshly installed, but if any one of them was upgraded from 3.2 the "gluster > peer probe" commands just hang for a while and return nothing. Following > that, "gluster peer status" results in output like the following for the new > peer being added. > > Hostname: compute-0-4.nerc-essc.ac.uk > Uuid: 111612e4-537b-49b4-9e88-2e0e1bae7fdf > State: Establishing Connection (Connected) > > Errors like these are produced in etc-glusterfs-glusterd.vol.log. > > [2012-08-02 13:00:53.553927] I > [glusterd-op-sm.c:2653:glusterd_op_txn_complete] 0-glusterd: Cleared local > lock > [2012-08-02 15:55:19.244849] I > [glusterd-handler.c:679:glusterd_handle_cli_probe] 0-glusterd: Received CLI > probe req compute-0-4.nerc-essc.ac.uk 24007 > [2012-08-02 15:55:19.357191] I [glusterd-handler.c:423:glusterd_friend_find] > 0-glusterd: Unable to find hostname: compute-0-4.nerc-essc.ac.uk > [2012-08-02 15:55:19.357261] I > [glusterd-handler.c:2222:glusterd_probe_begin] 0-glusterd: Unable to find > peerinfo for host: compute-0-4.nerc-essc.ac.uk (24007) > [2012-08-02 15:55:19.385050] I [glusterd-handler.c:2204:glusterd_friend_add] > 0-management: connect returned 0 > [2012-08-02 15:55:19.387162] E [socket.c:1715:socket_connect_finish] > 0-management: connection to failed (Connection refused) > [2012-08-02 15:55:19.387239] I > [glusterd-handler.c:2400:glusterd_xfer_cli_probe_resp] 0-glusterd: Responded > to CLI, ret: 0 > [2012-08-02 15:55:19.387274] I [mem-pool.c:576:mem_pool_destroy] > 0-management: size=2236 max=0 total=0 > [2012-08-02 15:55:19.387294] I [mem-pool.c:576:mem_pool_destroy] > 0-management: size=124 max=0 total=0 > [2012-08-02 15:55:33.026866] I > [glusterd-handler.c:813:glusterd_handle_cli_list_friends] 0-glusterd: > Received cli list req > [2012-08-02 15:55:49.766295] I > [glusterd-handler.c:679:glusterd_handle_cli_probe] 0-glusterd: Received CLI > probe req compute-0-4.nerc-essc.ac.uk 24007 > [2012-08-02 15:55:49.841049] I [glusterd-handler.c:423:glusterd_friend_find] > 0-glusterd: Unable to find hostname: compute-0-4.nerc-essc.ac.uk > [2012-08-02 15:55:49.841101] I > [glusterd-handler.c:2222:glusterd_probe_begin] 0-glusterd: Unable to find > peerinfo for host: compute-0-4.nerc-essc.ac.uk (24007) > [2012-08-02 15:55:49.857231] I [glusterd-handler.c:2204:glusterd_friend_add] > 0-management: connect returned 0 > [2012-08-02 15:55:49.857804] I > [glusterd-handshake.c:397:glusterd_set_clnt_mgmt_program] 0-: Using Program > glusterd mgmt, Num (1238433), Version (2) > [2012-08-02 15:55:49.857840] I > [glusterd-handshake.c:403:glusterd_set_clnt_mgmt_program] 0-: Using Program > Peer mgmt, Num (1238437), Version (2) > [2012-08-02 15:55:49.868300] I > [glusterd-rpc-ops.c:218:glusterd3_1_probe_cbk] 0-glusterd: Received probe > resp from uuid: 111612e4-537b-49b4-9e88-2e0e1bae7fdf, host: > compute-0-4.nerc-essc.ac.uk > [2012-08-02 15:55:49.868344] I [glusterd-handler.c:411:glusterd_friend_find] > 0-glusterd: Unable to find peer by uuid > [2012-08-02 15:55:49.868406] E [glusterd-sm.c:1022:glusterd_friend_sm] > 0-glusterd: handler returned: -1 > [2012-08-02 15:55:49.868425] I > [glusterd-rpc-ops.c:286:glusterd3_1_probe_cbk] 0-glusterd: Received resp to > probe req > > In /etc/glusterd/peers a file with the name of the machine being added is > produced, like this example. > > [root at bdan10 peers]# cat compute-0-4.nerc-essc.ac.uk > uuid=00000000-0000-0000-0000-000000000000 > state=0 > hostname1=compute-0-4.nerc-essc.ac.uk > > However the machine in question does have a valid uuid as shown below. > > [root at compute-0-4 etc]# cat /var/lib/glusterd/glusterd.info > UUID=111612e4-537b-49b4-9e88-2e0e1bae7fdf > > This one had GlusterFS 3.3 freshly installed and was not upgraded from 3.2. > On this machine the command "gluster peer status" outputs the following. > > Number of Peers: 1 > > Hostname: 192.171.166.92 > Uuid: 00000000-0000-0000-0000-000000000000 > State: Establishing Connection (Connected) > > The IP address shown refers to the server where "gluster peer probe" was > executed. > > I tried restarting glusterd on all the servers but it didn't make any > difference, and doing the "peer probe" from a different server in the > cluster had the same result. Has anyone else experienced this problem and > is there a solution or work-around? All suggestions would be much > appreciated. > > Regards, > Dan. > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)