On 01/17/2011 11:22 PM, Gerry Reno wrote: > On 01/17/2011 10:57 PM, Anand Avati wrote: > >> Looks like you have a stale process running. Can you force kill all >> gluster daemons, rm -rf /etc/glusterd and start fresh? Please ensure >> name resolution works fine between the hosts. >> >> Avati >> >> > Primary: > > # ps -ef | grep gluster > root 807 1 0 01:00 ? 00:00:00 > /usr/local/sbin/glusterd -p /var/run/glusterd.pid > > > Secondary: > > # ps -ef | grep gluster > root 1045 1 0 00:52 ? 00:00:00 > /usr/local/sbin/glusterd -p /var/run/glusterd.pid > > I don't see any stale processes. Nothing was marked defunct. > > I stopped all daemons and checked with ps that nothing was running. > I did rm -rf /etc/glusterd/ on both servers. > I can successfully ping between the servers both by hostname and by IP. > > I restarted the daemons and retried the probe and still have the same > problem. > > Here is the primary log: > > [2011-01-18 04:11:56.852521] I [glusterfsd.c:672:cleanup_and_exit] > glusterfsd: shutting down > [2011-01-18 04:13:32.713646] I [glusterd.c:275:init] management: > Using /etc/glusterd as working directory > [2011-01-18 04:13:32.714529] E [socket.c:322:__socket_server_bind] > tcp.management: binding to failed: Address already in use > [2011-01-18 04:13:32.714544] E [socket.c:325:__socket_server_bind] > tcp.management: Port is already in use > [2011-01-18 04:13:32.714607] I [glusterd.c:96:glusterd_uuid_init] > glusterd: generated UUID: a39c5d2f-dac2-436b-b715-425becf9075c > Given volfile: > +------------------------------------------------------------------------------+ > 1: volume management > 2: type mgmt/glusterd > 3: option working-directory /etc/glusterd > 4: option transport-type socket,tcp,rdma > 5: option transport.socket.keepalive-time 10 > 6: option transport.socket.keepalive-interval 2 > 7: end-volume > 8: > > +------------------------------------------------------------------------------+ > [2011-01-18 04:13:55.74921] I > [glusterd-handler.c:562:glusterd_handle_cli_probe] glusterd: > Received CLI probe req 10.XXX.58.95 24007 > [2011-01-18 04:13:55.76532] I > [glusterd-handler.c:397:glusterd_friend_find] glusterd: Unable to > find hostname: 10.XXX.58.95 > [2011-01-18 04:13:55.76550] I > [glusterd-handler.c:2615:glusterd_probe_begin] glusterd: Unable to > find peerinfo for host: 10.XXX.58.95 (24007) > [2011-01-18 04:13:55.78817] W > [rpc-transport.c:849:rpc_transport_load] rpc-transport: missing > 'option transport-type'. defaulting to "socket" > [2011-01-18 04:13:55.79386] I > [glusterd-handler.c:2597:glusterd_friend_add] glusterd: connect > returned 0 > [2011-01-18 04:14:16.78380] E [socket.c:1661:socket_connect_finish] > management: connection to failed (Connection timed out) > > > > Anything else I can try? > > I just rebooted both instances and then checked the log right after bootup. The daemons are set to start during boot sequence. Still some kind of connection problem. Primary log: [2011-01-18 04:27:53.706382] I [glusterfsd.c:672:cleanup_and_exit] glusterfsd: shutting down [2011-01-18 04:28:09.699032] I [glusterd.c:275:init] management: Using /etc/glusterd as working directory [2011-01-18 04:28:09.729714] E [socket.c:322:__socket_server_bind] tcp.management: binding to failed: Address already in use [2011-01-18 04:28:09.729751] E [socket.c:325:__socket_server_bind] tcp.management: Port is already in use [2011-01-18 04:28:09.731413] I [glusterd.c:87:glusterd_uuid_init] glusterd: retrieved UUID: a39c5d2f-dac2-436b-b715-425becf9075c [2011-01-18 04:28:09.734698] E [glusterd-store.c:1446:glusterd_store_retrieve_peers] management: key: 0x248e0e0, and value: (nil) Given volfile: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option working-directory /etc/glusterd 4: option transport-type socket,tcp,rdma 5: option transport.socket.keepalive-time 10 6: option transport.socket.keepalive-interval 2 7: end-volume 8: +------------------------------------------------------------------------------+