Is node 5 still showing "Disconnected" for node 6? On Mon, Mar 21, 2011 at 12:08 PM, Burnash, James <jburnash at knight.com> wrote: > After 'service glusterd restart' on node 6: > > root at jc1letgfs5:/etc/glusterd/vols# gluster peer status > Number of Peers: 3 > > Hostname: jc1letgfs6 > Uuid: cd590fad-022c-4b9a-97f5-3262080d772d > State: Peer in Cluster (Disconnected) > > Hostname: jc1letgfs7 > Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29 > State: Peer in Cluster (Connected) > > Hostname: jc1letgfs8 > Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd > State: Peer in Cluster (Connected) > > BUT ... after 'service glusterd restart' on node 5: > > root at jc1letgfs5:/etc/glusterd/vols# gluster peer status > Number of Peers: 3 > > Hostname: jc1letgfs7 > Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29 > State: Peer in Cluster (Connected) > > Hostname: jc1letgfs8 > Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd > State: Peer in Cluster (Connected) > > Hostname: jc1letgfs6 > Uuid: cd590fad-022c-4b9a-97f5-3262080d772d > State: Peer in Cluster (Connected) > > Works now. Thanks so much. I suspect a race condition of some sort, though what I'll leave up to the devs. > > -----Original Message----- > From: Mohit Anchlia [mailto:mohitanchlia at gmail.com] > Sent: Monday, March 21, 2011 2:57 PM > To: Burnash, James; gluster-users at gluster.org > Subject: Re: What does this error mean? > > At this point can you do /etc/init.d/gluster stop and then start and > see if this changes anything? Or do you see same behaviour? I am > thinking gluster might have tried to start too soon on reboot. > > On Mon, Mar 21, 2011 at 11:43 AM, Burnash, James <jburnash at knight.com> wrote: >> Short answers - yes all on the same subnet. >> Every host can ping the others >> Iptables shows empty entries for all filters >> >> Details are here - http://pastebin.com/eKtRMbGE >> >> I did explicitly turn the iptables off again, and then checked again: >> >> jc1letgfs5 >> Firewall is stopped. >> >> jc1letgfs6 >> Firewall is stopped. >> >> jc1letgfs7 >> Firewall is stopped. >> >> jc1letgfs8 >> Firewall is stopped. >> >> Thanks, >> >> James >> >> -----Original Message----- >> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com] >> Sent: Monday, March 21, 2011 2:25 PM >> To: Burnash, James >> Cc: gluster-users at gluster.org >> Subject: Re: What does this error mean? >> >> Are they in same subnet? What happens if you ping these hosts >> individually? Do they ping? >> >> I closely looked at the error you posted and "connection to >> 10.20.72.157:24007 failed (No route to host" points to either firewall >> issue or could be a switch issue on the network. Ping test on each >> host to each other will be helpful. >> >> Can you post results of ping and also "service iptables status" from each node? >> >> On Mon, Mar 21, 2011 at 11:16 AM, Burnash, James <jburnash at knight.com> wrote: >>> A little more information: >>> >>> From the original (first peer node): >>> root at jc1letgfs5:/etc/glusterd/vols# gluster peer status >>> Number of Peers: 3 >>> >>> Hostname: jc1letgfs6 >>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d >>> State: Peer in Cluster (Disconnected) >>> >>> Hostname: jc1letgfs7 >>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29 >>> State: Peer in Cluster (Connected) >>> >>> Hostname: jc1letgfs8 >>> Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd >>> State: Peer in Cluster (Connected) >>> >>> >>> From the problem node: >>> *** NOTE - only one Peer seen >>> root at jc1letgfs6:~# gluster peer status >>> Number of Peers: 1 >>> >>> Hostname: 10.20.72.156 >>> Uuid: 95e1d79a-632a-4774-9d7e-a7234cb084ca >>> State: Peer in Cluster (Connected) >>> >>> >>> From a different peer node: >>> root at jc1letgfs8:~# gluster peer status >>> Number of Peers: 3 >>> >>> Hostname: jc1letgfs6 >>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d >>> State: Peer Rejected (Connected) >>> >>> Hostname: jc1letgfs7 >>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29 >>> State: Peer in Cluster (Connected) >>> >>> Hostname: 10.20.72.156 >>> Uuid: 95e1d79a-632a-4774-9d7e-a7234cb084ca >>> State: Peer in Cluster (Connected) >>> >>> -----Original Message----- >>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James >>> Sent: Monday, March 21, 2011 2:05 PM >>> To: Mohit Anchlia >>> Cc: gluster-users at gluster.org >>> Subject: Re: What does this error mean? >>> >>> I did do this, and noting in particular stands out. >>> >>> I'll exercise it some more, and see if we can get something that will at least point in the proper direction. >>> >>> I suspect that another reboot of the affected machine will fix this condition - but it won't help me understand the root problem the next time this happens. >>> >>> Thanks, >>> >>> James >>> >>> -----Original Message----- >>> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com] >>> Sent: Monday, March 21, 2011 12:40 PM >>> To: Burnash, James >>> Cc: gluster-users at gluster.org >>> Subject: Re: What does this error mean? >>> >>> Can you turn on DEBUG and see if there is something that stands out? >>> >>> On Mon, Mar 21, 2011 at 9:34 AM, Burnash, James <jburnash at knight.com> wrote: >>>> Does anybody have any clue as to why this is happening? The problem has persisted for several days now, but I can't find anything at all in the logs to possibly explain why this is so. >>>> >>>> -----Original Message----- >>>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James >>>> Sent: Wednesday, March 16, 2011 9:10 AM >>>> To: gluster-users at gluster.org >>>> Subject: [SPAM?] What does this error mean? >>>> Importance: Low >>>> >>>> Hello. >>>> >>>> After purposely crashing (via ' echo b>/proc/sysrq-trigger ) node jc1letgfs6 to test mirroring, even after the node has rebooted and is back online I am still seeing the statement "Disconnected" for that node when I execute the following command on the first storage node: >>>> >>>> root at jc1letgfs5:/etc/glusterd/vols# gluster peer status >>>> Number of Peers: 3 >>>> >>>> Hostname: jc1letgfs6 >>>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d >>>> State: Peer in Cluster (Disconnected) >>>> >>>> Hostname: jc1letgfs7 >>>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29 >>>> State: Peer in Cluster (Disconnected) >>>> >>>> Hostname: jc1letgfs8 >>>> Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd >>>> State: Peer in Cluster (Connected) >>>> >>>> This is running on 4 servers with CentOS 5.5 (x86_64), GlusterFS 3.1.1 >>>> >>>> Here is the volume info: >>>> >>>> # gluster volume info >>>> >>>> Volume Name: test-pfs-ro1 >>>> Type: Distributed-Replicate >>>> Status: Started >>>> Number of Bricks: 4 x 2 = 8 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: jc1letgfs5:/export/read-only/g01 >>>> Brick2: jc1letgfs6:/export/read-only/g01 >>>> Brick3: jc1letgfs5:/export/read-only/g02 >>>> Brick4: jc1letgfs6:/export/read-only/g02 >>>> Brick5: jc1letgfs7:/export/read-only/g01 >>>> Brick6: jc1letgfs8:/export/read-only/g01 >>>> Brick7: jc1letgfs7:/export/read-only/g02 >>>> Brick8: jc1letgfs8:/export/read-only/g02 >>>> Options Reconfigured: >>>> performance.stat-prefetch: on >>>> performance.cache-size: 2GB >>>> network.ping-timeout: 10 >>>> >>>> Even with this error, mirroring functions as expected, and the node is recognized and utilized, as can be seen in this log fragment from jc1letgfs5: /var/log/glusterfs/etc-glusterfs-glusterd.vol.log >>>> >>>> [2011-03-13 23:51:31.458329] E [socket.c:1656:socket_connect_finish] management: connection to 10.20.72.157:24007 failed (No route to ho >>>> st) >>>> [2011-03-13 23:53:49.42170] I [glusterd3_1-mops.c:172:glusterd3_1_friend_add_cbk] glusterd: Received ACC from uuid: cd590fad-022c-4b9a-9 >>>> 7f5-3262080d772d, host: jc1letgfs6, port: 0 >>>> [2011-03-13 23:53:49.42204] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster >>>> [2011-03-13 23:53:49.42320] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster >>>> [2011-03-13 23:53:49.42336] I [glusterd-handler.c:2267:glusterd_handle_friend_update] glusterd: Received friend update from uuid: cd590f >>>> ad-022c-4b9a-97f5-3262080d772d >>>> [2011-03-13 23:53:49.42359] I [glusterd-handler.c:2312:glusterd_handle_friend_update] : Received uuid: 95e1d79a-632a-4774-9d7e-a7234cb08 >>>> 4ca, hostname:10.20.72.156 >>>> [2011-03-13 23:53:49.42412] I [glusterd-handler.c:2315:glusterd_handle_friend_update] : Received my uuid as Friend >>>> >>>> >>>> Any pointers or help would be appreciated. >>>> >>>> James Burnash, Unix Engineering >>>> >>>> >>>> DISCLAIMER: >>>> This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. >>>> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>> >> >