On 01/17/2011 10:57 PM, Anand Avati wrote: > Looks like you have a stale process running. Can you force kill all > gluster daemons, rm -rf /etc/glusterd and start fresh? Please ensure > name resolution works fine between the hosts. > > Avati > Primary: # ps -ef | grep gluster root 807 1 0 01:00 ? 00:00:00 /usr/local/sbin/glusterd -p /var/run/glusterd.pid Secondary: # ps -ef | grep gluster root 1045 1 0 00:52 ? 00:00:00 /usr/local/sbin/glusterd -p /var/run/glusterd.pid I don't see any stale processes. Nothing was marked defunct. I stopped all daemons and checked with ps that nothing was running. I did rm -rf /etc/glusterd/ on both servers. I can successfully ping between the servers both by hostname and by IP. I restarted the daemons and retried the probe and still have the same problem. Here is the primary log: [2011-01-18 04:11:56.852521] I [glusterfsd.c:672:cleanup_and_exit] glusterfsd: shutting down [2011-01-18 04:13:32.713646] I [glusterd.c:275:init] management: Using /etc/glusterd as working directory [2011-01-18 04:13:32.714529] E [socket.c:322:__socket_server_bind] tcp.management: binding to failed: Address already in use [2011-01-18 04:13:32.714544] E [socket.c:325:__socket_server_bind] tcp.management: Port is already in use [2011-01-18 04:13:32.714607] I [glusterd.c:96:glusterd_uuid_init] glusterd: generated UUID: a39c5d2f-dac2-436b-b715-425becf9075c Given volfile: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option working-directory /etc/glusterd 4: option transport-type socket,tcp,rdma 5: option transport.socket.keepalive-time 10 6: option transport.socket.keepalive-interval 2 7: end-volume 8: +------------------------------------------------------------------------------+ [2011-01-18 04:13:55.74921] I [glusterd-handler.c:562:glusterd_handle_cli_probe] glusterd: Received CLI probe req 10.XXX.58.95 24007 [2011-01-18 04:13:55.76532] I [glusterd-handler.c:397:glusterd_friend_find] glusterd: Unable to find hostname: 10.XXX.58.95 [2011-01-18 04:13:55.76550] I [glusterd-handler.c:2615:glusterd_probe_begin] glusterd: Unable to find peerinfo for host: 10.XXX.58.95 (24007) [2011-01-18 04:13:55.78817] W [rpc-transport.c:849:rpc_transport_load] rpc-transport: missing 'option transport-type'. defaulting to "socket" [2011-01-18 04:13:55.79386] I [glusterd-handler.c:2597:glusterd_friend_add] glusterd: connect returned 0 [2011-01-18 04:14:16.78380] E [socket.c:1661:socket_connect_finish] management: connection to failed (Connection timed out) Anything else I can try?