What does this error mean?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After 'service glusterd restart' on node 6:

root at jc1letgfs5:/etc/glusterd/vols# gluster peer status
Number of Peers: 3

Hostname: jc1letgfs6
Uuid: cd590fad-022c-4b9a-97f5-3262080d772d
State: Peer in Cluster (Disconnected)

Hostname: jc1letgfs7
Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29
State: Peer in Cluster (Connected)

Hostname: jc1letgfs8
Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd
State: Peer in Cluster (Connected)

BUT ... after 'service glusterd restart' on node 5:

root at jc1letgfs5:/etc/glusterd/vols# gluster peer status
Number of Peers: 3

Hostname: jc1letgfs7
Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29
State: Peer in Cluster (Connected)

Hostname: jc1letgfs8
Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd
State: Peer in Cluster (Connected)

Hostname: jc1letgfs6
Uuid: cd590fad-022c-4b9a-97f5-3262080d772d
State: Peer in Cluster (Connected)

Works now. Thanks so much. I suspect a race condition of some sort, though what I'll leave up to the devs.

-----Original Message-----
From: Mohit Anchlia [mailto:mohitanchlia at gmail.com] 
Sent: Monday, March 21, 2011 2:57 PM
To: Burnash, James; gluster-users at gluster.org
Subject: Re: What does this error mean?

At this point can you do /etc/init.d/gluster stop and then start and
see if this changes anything? Or do you see same behaviour? I am
thinking gluster might have tried to start too soon on reboot.

On Mon, Mar 21, 2011 at 11:43 AM, Burnash, James <jburnash at knight.com> wrote:
> Short answers - yes all on the same subnet.
> Every host can ping the others
> Iptables shows empty entries for all filters
>
> Details are here - http://pastebin.com/eKtRMbGE
>
> I did explicitly turn the iptables off again, and then checked again:
>
> jc1letgfs5
> Firewall is stopped.
>
> jc1letgfs6
> Firewall is stopped.
>
> jc1letgfs7
> Firewall is stopped.
>
> jc1letgfs8
> Firewall is stopped.
>
> Thanks,
>
> James
>
> -----Original Message-----
> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com]
> Sent: Monday, March 21, 2011 2:25 PM
> To: Burnash, James
> Cc: gluster-users at gluster.org
> Subject: Re: What does this error mean?
>
> Are they in same subnet? What happens if you ping these hosts
> individually? Do they ping?
>
> I closely looked at the error you posted and "connection to
> 10.20.72.157:24007 failed (No route to host" points to either firewall
> issue or could be a switch issue on the network. Ping test on each
> host to each other will be helpful.
>
> Can you post results of ping and also "service iptables status" from each node?
>
> On Mon, Mar 21, 2011 at 11:16 AM, Burnash, James <jburnash at knight.com> wrote:
>> A little more information:
>>
>> From the original (first peer node):
>> root at jc1letgfs5:/etc/glusterd/vols# gluster peer status
>> Number of Peers: 3
>>
>> Hostname: jc1letgfs6
>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d
>> State: Peer in Cluster (Disconnected)
>>
>> Hostname: jc1letgfs7
>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29
>> State: Peer in Cluster (Connected)
>>
>> Hostname: jc1letgfs8
>> Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd
>> State: Peer in Cluster (Connected)
>>
>>
>> From the problem node:
>> *** NOTE - only one Peer seen
>> root at jc1letgfs6:~# gluster peer status
>> Number of Peers: 1
>>
>> Hostname: 10.20.72.156
>> Uuid: 95e1d79a-632a-4774-9d7e-a7234cb084ca
>> State: Peer in Cluster (Connected)
>>
>>
>> From a different peer node:
>> root at jc1letgfs8:~# gluster peer status
>> Number of Peers: 3
>>
>> Hostname: jc1letgfs6
>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d
>> State: Peer Rejected (Connected)
>>
>> Hostname: jc1letgfs7
>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29
>> State: Peer in Cluster (Connected)
>>
>> Hostname: 10.20.72.156
>> Uuid: 95e1d79a-632a-4774-9d7e-a7234cb084ca
>> State: Peer in Cluster (Connected)
>>
>> -----Original Message-----
>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
>> Sent: Monday, March 21, 2011 2:05 PM
>> To: Mohit Anchlia
>> Cc: gluster-users at gluster.org
>> Subject: Re: What does this error mean?
>>
>> I did do this, and noting in particular stands out.
>>
>> I'll exercise it some more, and see if we can get something that will at least point in the proper direction.
>>
>> I suspect that another reboot of the affected machine will fix this condition - but it won't help me understand the root problem the next time this happens.
>>
>> Thanks,
>>
>> James
>>
>> -----Original Message-----
>> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com]
>> Sent: Monday, March 21, 2011 12:40 PM
>> To: Burnash, James
>> Cc: gluster-users at gluster.org
>> Subject: Re: What does this error mean?
>>
>> Can you turn on DEBUG and see if there is something that stands out?
>>
>> On Mon, Mar 21, 2011 at 9:34 AM, Burnash, James <jburnash at knight.com> wrote:
>>> Does anybody have any clue as to why this is happening? The problem has persisted for several days now, but I can't find anything at all in the logs to possibly explain why this is so.
>>>
>>> -----Original Message-----
>>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
>>> Sent: Wednesday, March 16, 2011 9:10 AM
>>> To: gluster-users at gluster.org
>>> Subject: [SPAM?] What does this error mean?
>>> Importance: Low
>>>
>>> Hello.
>>>
>>> After purposely crashing (via ' echo b>/proc/sysrq-trigger ) node jc1letgfs6 to test mirroring, even after the node has rebooted and is back online I am still seeing the statement "Disconnected" for that node when I execute the following command on the first storage node:
>>>
>>> root at jc1letgfs5:/etc/glusterd/vols# gluster peer status
>>> Number of Peers: 3
>>>
>>> Hostname: jc1letgfs6
>>> Uuid: cd590fad-022c-4b9a-97f5-3262080d772d
>>> State: Peer in Cluster (Disconnected)
>>>
>>> Hostname: jc1letgfs7
>>> Uuid: c5f40de4-9bb1-47ad-93b6-d52c6689ee29
>>> State: Peer in Cluster (Disconnected)
>>>
>>> Hostname: jc1letgfs8
>>> Uuid: 13f4ce3f-042e-4144-a76c-d2b1b91676bd
>>> State: Peer in Cluster (Connected)
>>>
>>> This is running on 4 servers with CentOS 5.5 (x86_64), GlusterFS 3.1.1
>>>
>>> Here is the volume info:
>>>
>>> # gluster volume info
>>>
>>> Volume Name: test-pfs-ro1
>>> Type: Distributed-Replicate
>>> Status: Started
>>> Number of Bricks: 4 x 2 = 8
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: jc1letgfs5:/export/read-only/g01
>>> Brick2: jc1letgfs6:/export/read-only/g01
>>> Brick3: jc1letgfs5:/export/read-only/g02
>>> Brick4: jc1letgfs6:/export/read-only/g02
>>> Brick5: jc1letgfs7:/export/read-only/g01
>>> Brick6: jc1letgfs8:/export/read-only/g01
>>> Brick7: jc1letgfs7:/export/read-only/g02
>>> Brick8: jc1letgfs8:/export/read-only/g02
>>> Options Reconfigured:
>>> performance.stat-prefetch: on
>>> performance.cache-size: 2GB
>>> network.ping-timeout: 10
>>>
>>> Even with this error, mirroring functions as expected, and the node is recognized and utilized, as can be seen in this log fragment from jc1letgfs5: /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
>>>
>>> [2011-03-13 23:51:31.458329] E [socket.c:1656:socket_connect_finish] management: connection to 10.20.72.157:24007 failed (No route to ho
>>> st)
>>> [2011-03-13 23:53:49.42170] I [glusterd3_1-mops.c:172:glusterd3_1_friend_add_cbk] glusterd: Received ACC from uuid: cd590fad-022c-4b9a-9
>>> 7f5-3262080d772d, host: jc1letgfs6, port: 0
>>> [2011-03-13 23:53:49.42204] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster
>>> [2011-03-13 23:53:49.42320] I [glusterd-utils.c:2062:glusterd_friend_find_by_uuid] glusterd: Friend found.. state: Peer in Cluster
>>> [2011-03-13 23:53:49.42336] I [glusterd-handler.c:2267:glusterd_handle_friend_update] glusterd: Received friend update from uuid: cd590f
>>> ad-022c-4b9a-97f5-3262080d772d
>>> [2011-03-13 23:53:49.42359] I [glusterd-handler.c:2312:glusterd_handle_friend_update] : Received uuid: 95e1d79a-632a-4774-9d7e-a7234cb08
>>> 4ca, hostname:10.20.72.156
>>> [2011-03-13 23:53:49.42412] I [glusterd-handler.c:2315:glusterd_handle_friend_update] : Received my uuid as Friend
>>>
>>>
>>> Any pointers or help would be appreciated.
>>>
>>> James Burnash, Unix Engineering
>>>
>>>
>>> DISCLAIMER:
>>> This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
>>> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux