Sure I am. Unfortunately it didn't change the result... # killall glusterd # ps -ef | grep gluster root 15755 657 0 18:35 ttyS0 00:00:00 grep gluster # rm /var/lib/glusterd/peers/* # /usr/sbin/glusterd -p /var/run/glusterd.pid # gluster peer probe 10.32.1.144 # (I killed glusterd and removed the files on both servers.) Regards Andreas On 03/24/15 05:36, Atin Mukherjee wrote: > If you are okay to do a fresh set up I would recommend you to clean up > /var/lib/glusterd/peers/* and then restart glusterd in both the nodes > and then try peer probing. > > ~Atin > > On 03/23/2015 06:44 PM, Andreas wrote: >> Hi, >> >> # gluster peer detach 10.32.1.144 >> (No output here. Similar to the problem with 'gluster peer probe'.) >> # gluster peer detach 10.32.1.144 force >> peer detach: failed: Peer is already being detached from cluster. >> Check peer status by running gluster peer status >> # gluster peer status >> Number of Peers: 1 >> >> Hostname: 10.32.1.144 >> Uuid: 82cdb873-28cc-4ed0-8cfe-2b6275770429 >> State: Probe Sent to Peer (Connected) >> >> # ping 10.32.1.144 >> PING 10.32.1.144 (10.32.1.144): 56 data bytes >> 64 bytes from 10.32.1.144: seq=0 ttl=64 time=1.811 ms >> 64 bytes from 10.32.1.144: seq=1 ttl=64 time=1.834 ms >> ^C >> --- 10.32.1.144 ping statistics --- >> 2 packets transmitted, 2 packets received, 0% packet loss >> round-trip min/avg/max = 1.811/1.822/1.834 ms >> >> >> As previously stated, this problem seems to be similar to what I experienced with >> 'gluster peer probe'. I can reboot the server, but the situation will be the same >> (I've tried this many times). >> Any ideas of which ports to investigate and how to do it to get the most reliable result? >> Anything else that could cause this? >> >> >> >> Regards >> Andreas >> >> >> On 03/23/15 11:10, Atin Mukherjee wrote: >>> On 03/23/2015 03:28 PM, Andreas Hollaus wrote: >>>> 2Hi, >>>> >>>> This network problem is persistent. However, I can ping the server so guess it >>>> depends on the port no, right? >>>> I tried to telnet to port 24007, but I was not sure how to interpret the result as I >>>> got no respons and no timeout (it just seemed to be waiting for something). >>>> That's why I decided to install nmap, but according to that tool the port was >>>> accessible. Are there any other ports that are vital to gluster peer probe? >>>> >>>> When you say 'deprobe', I guess you mean 'gluster peer detach'? That command shows >>>> similar behaviour to gluster peer probe. >>> Yes I meant peer detach. How about gluster peer detach force? >> >>>> Regards >>>> Andreas >>>> >>>> On 03/23/15 05:34, Atin Mukherjee wrote: >>>>> On 03/22/2015 07:11 PM, Andreas Hollaus wrote: >>>>>> Hi, >>>>>> >>>>>> I hope that these are the logs that you requested. >>>>>> >>>>>> Logs from 10.32.0.48: >>>>>> ------------------------------ >>>>>> # more /var/log/glusterfs/.cmd_log_history >>>>>> [2015-03-19 13:52:03.277438] : peer probe 10.32.1.144 : FAILED : Probe returned >>>>>> with unknown errno -1 >>>>>> >>>>>> # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log >>>>>> [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/s >>>>>> bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/ >>>>>> glusterd -p /var/run/glusterd.pid) >>>>>> [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: Maximum allo >>>>>> wed open file descriptors set to 65536 >>>>>> [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: Using /var/l >>>>>> ib/glusterd as working directory >>>>>> [2015-03-19 13:41:31.247826] I [glusterd-store.c:2063:glusterd_restore_op_versio >>>>>> n] 0-management: Detected new install. Setting op-version to maximum : 30600 >>>>>> [2015-03-19 13:41:31.247902] I [glusterd-store.c:3497:glusterd_store_retrieve_mi >>>>>> ssed_snaps_list] 0-management: No missed snaps list. >>>>>> Final graph: >>>>>> +------------------------------------------------------------------------------+ >>>>>> 1: volume management >>>>>> 2: type mgmt/glusterd >>>>>> 3: option rpc-auth.auth-glusterfs on >>>>>> 4: option rpc-auth.auth-unix on >>>>>> 5: option rpc-auth.auth-null on >>>>>> 6: option transport.socket.listen-backlog 128 >>>>>> 7: option ping-timeout 30 >>>>>> 8: option transport.socket.read-fail-log off >>>>>> 9: option transport.socket.keepalive-interval 2 >>>>>> 10: option transport.socket.keepalive-time 10 >>>>>> 11: option transport-type socket >>>>>> 12: option working-directory /var/lib/glusterd >>>>>> 13: end-volume >>>>>> 14: >>>>>> +------------------------------------------------------------------------------+ >>>>>> [2015-03-19 13:42:02.258403] I [glusterd-handler.c:1015:__glusterd_handle_cli_pr >>>>>> obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007 >>>>>> [2015-03-19 13:42:02.259456] I [glusterd-handler.c:3165:glusterd_probe_begin] 0- >>>>>> glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007) >>>>>> [2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-manag >>>>>> ement: setting frame-timeout to 600 >>>>>> [2015-03-19 13:42:02.260488] I [glusterd-handler.c:3098:glusterd_friend_add] 0-m >>>>>> anagement: connect returned 0 >>>>>> [2015-03-19 13:42:02.270316] I [glusterd.c:176:glusterd_uuid_generate_save] 0-ma >>>>>> nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58 >>>>>> [2015-03-19 13:42:02.273427] I [glusterd-rpc-ops.c:244:__glusterd_probe_cbk] 0-m >>>>>> anagement: Received probe resp from uuid: 82cdb873-28cc-4ed0-8cfe-2b6275770429, >>>>>> host: 10.32.1.144 >>>>>> [2015-03-19 13:42:02.273681] I [glusterd-rpc-ops.c:386:__glusterd_probe_cbk] 0-g >>>>>> lusterd: Received resp to probe req >>>>>> [2015-03-19 13:42:02.278863] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_ >>>>>> versions_ack] 0-management: using the op-version 30600 >>>>>> [2015-03-19 13:52:03.277422] E [rpc-clnt.c:201:call_bail] 0-management: bailing >>>>>> out frame type(Peer mgmt) op(--(2)) xid = 0x6 sent = 2015-03-19 13:42:02.273482. >>>>>> timeout = 600 for 10.32.1.144:24007 >>>>> Here is the issue, there was some problem in the network at the time >>>>> when peer probe was issued. This is why the call bail is seen. Could you >>>>> try to deprobe and then probe it back again? >>>>>> [2015-03-19 13:52:03.277453] I [socket.c:3366:socket_submit_reply] 0-socket.mana >>>>>> gement: not connected (priv->connected = 255) >>>>>> [2015-03-19 13:52:03.277468] E [rpcsvc.c:1247:rpcsvc_submit_generic] 0-rpc-servi >>>>>> ce: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, >>>>>> Proc: 1) to rpc-transport (socket.management) >>>>>> [2015-03-19 13:52:03.277483] E [glusterd-utils.c:387:glusterd_submit_reply] 0-: >>>>>> Reply submission failed >>>>>> >>>>>> >>>>>> >>>>>> Logs from 10.32.1.144: >>>>>> --------------------------------- >>>>>> # more ./.cmd_log_history >>>>>> >>>>>> # more ./etc-glusterfs-glusterd.vol.log >>>>>> [1970-01-01 00:00:53.225739] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/s >>>>>> bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/ >>>>>> glusterd -p /var/run/glusterd.pid) >>>>>> [1970-01-01 00:00:53.229222] I [glusterd.c:1214:init] 0-management: Maximum allo >>>>>> wed open file descriptors set to 65536 >>>>>> [1970-01-01 00:00:53.229301] I [glusterd.c:1259:init] 0-management: Using /var/l >>>>>> ib/glusterd as working directory >>>>>> [1970-01-01 00:00:53.231653] I [glusterd-store.c:2063:glusterd_restore_op_versio >>>>>> n] 0-management: Detected new install. Setting op-version to maximum : 30600 >>>>>> [1970-01-01 00:00:53.231730] I [glusterd-store.c:3497:glusterd_store_retrieve_mi >>>>>> ssed_snaps_list] 0-management: No missed snaps list. >>>>>> Final graph: >>>>>> +------------------------------------------------------------------------------+ >>>>>> 1: volume management >>>>>> 2: type mgmt/glusterd >>>>>> 3: option rpc-auth.auth-glusterfs on >>>>>> 4: option rpc-auth.auth-unix on >>>>>> 5: option rpc-auth.auth-null on >>>>>> 6: option transport.socket.listen-backlog 128 >>>>>> 7: option ping-timeout 30 >>>>>> 8: option transport.socket.read-fail-log off >>>>>> 9: option transport.socket.keepalive-interval 2 >>>>>> 10: option transport.socket.keepalive-time 10 >>>>>> 11: option transport-type socket >>>>>> 12: option working-directory /var/lib/glusterd >>>>>> 13: end-volume >>>>>> 14: >>>>>> +------------------------------------------------------------------------------+ >>>>>> [1970-01-01 00:01:24.417689] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_ >>>>>> versions_ack] 0-management: using the op-version 30600 >>>>>> [1970-01-01 00:01:24.417736] I [glusterd.c:176:glusterd_uuid_generate_save] 0-ma >>>>>> nagement: generated UUID: 82cdb873-28cc-4ed0-8cfe-2b6275770429 >>>>>> [1970-01-01 00:01:24.420067] I [glusterd-handler.c:2523:__glusterd_handle_probe_ >>>>>> query] 0-glusterd: Received probe from uuid: 4441e237-89d6-4cdf-a212-f17ecb953b5 >>>>>> 8 >>>>>> [1970-01-01 00:01:24.420158] I [glusterd-handler.c:2551:__glusterd_handle_probe_ >>>>>> query] 0-glusterd: Unable to find peerinfo for host: 10.32.0.48 (24007) >>>>>> [1970-01-01 00:01:24.420379] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-manag >>>>>> ement: setting frame-timeout to 600 >>>>>> [1970-01-01 00:01:24.421140] I [glusterd-handler.c:3098:glusterd_friend_add] 0-m >>>>>> anagement: connect returned 0 >>>>>> [1970-01-01 00:01:24.421167] I [glusterd-handler.c:2575:__glusterd_handle_probe_ >>>>>> query] 0-glusterd: Responded to 10.32.0.48, op_ret: 0, op_errno: 0, ret: 0 >>>>>> [1970-01-01 00:01:24.422991] I [glusterd-handler.c:2216:__glusterd_handle_incomi >>>>>> ng_friend_req] 0-glusterd: Received probe from uuid: 4441e237-89d6-4cdf-a212-f17 >>>>>> ecb953b58 >>>>>> [1970-01-01 00:01:24.423024] E [glusterd-utils.c:5760:glusterd_compare_friend_da >>>>>> ta] 0-management: Importing global options failed >>>>>> [1970-01-01 00:01:24.423036] E [glusterd-sm.c:1078:glusterd_friend_sm] 0-gluster >>>>>> d: handler returned: -2 >>>>>> >>>>>> >>>>>> Regards >>>>>> Andreas >>>>>> >>>>>> >>>>>> On 03/22/15 07:33, Atin Mukherjee wrote: >>>>>>> On 03/22/2015 12:09 AM, Andreas Hollaus wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I get a strange result when I execute 'gluster peer probe'. The command hangs and >>>>>>>> seems to timeout without any message (I can ping the address): >>>>>>>> # gluster peer probe 10.32.1.144 >>>>>>>> # echo $? >>>>>>>> 146 >>>>>>> Could you provide the glusterd log and .cmd_log_history for all the >>>>>>> nodes in the cluster? >>>>>>>> The status looks promising, but there's a differences between this output and what >>>>>>>> you normally get from a successful call: >>>>>>>> # gluster peer status >>>>>>>> Number of Peers: 1 >>>>>>>> >>>>>>>> Hostname: 10.32.1.144 >>>>>>>> Uuid: 0b008d3e-c51b-4243-ad19-c79c869ba9f2 >>>>>>>> State: Probe Sent to Peer (Connected) >>>>>>>> >>>>>>>> (instead of 'State: Peer in Cluster (Connected)') >>>>>>>> >>>>>>>> Running the command again will tell you that it is connected: >>>>>>>> >>>>>>>> # gluster peer probe 10.32.1.144 >>>>>>>> peer probe: success. Host 10.32.1.144 port 24007 already in peer list >>>>>>> This means that this peer was added locally but peer handshake was not >>>>>>> completed for previous peer probe transaction. I would be interested to >>>>>>> see the logs and then can comment on what went wrong. >>>>>>>> But when you try to add a brick from that server it fails: >>>>>>>> >>>>>>>> # gluster volume add-brick c_test replica 2 10.32.1.144:/opt/lvmdir/c2 force >>>>>>>> volume add-brick: failed: Host 10.32.1.144 is not in 'Peer in Cluster' state >>>>>>>> >>>>>>>> The volume was previously created using the following commands: >>>>>>>> # gluster volume create c_test 10.32.0.48:/opt/lvmdir/c2 force >>>>>>>> volume create: c_test: success: please start the volume to access data >>>>>>>> # gluster volume start c_test >>>>>>>> volume start: c_test: success >>>>>>>> >>>>>>>> What could be the reason for this problem? >>>>>>>> >>>>>>>> >>>>>>>> Regards >>>>>>>> Andreas >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users@xxxxxxxxxxx >>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>> >> _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users