peer status rejected (connected)

toby.corkindale at strategicdata.com.au (Toby Corkindale) · Wed, 07 Aug 2013 10:51:37 +1000

On 06/08/13 21:25, Kaushal M wrote:
> Toby,
> What versions of gluster are on the peers? And does the cluster have
> just two peers or more?

Version 3.3.1.
The cluster has/had two nodes; we're trying to replace one with another one.

> On Tue, Aug 6, 2013 at 4:32 PM, Toby Corkindale
> <toby.corkindale at strategicdata.com.au> wrote:
>> ----- Original Message -----
>>> From: "Toby Corkindale" <toby.corkindale at strategicdata.com.au>
>>> To: gluster-users at gluster.org
>>> Sent: Tuesday, 6 August, 2013 6:26:59 PM
>>> Subject: Re: peer status rejected (connected)
>>>
>>> On 06/08/13 18:12, Toby Corkindale wrote:
>>>> Hi,
>>>> What does it mean when you use "peer probe" to add a new host, but then
>>>> afterwards the "peer status" is reported as "Rejected" yet "Connected"?
>>>> And of course -- how does one fix this?
>>>>
>>>> gluster> peer status
>>>> Number of Peers: 1
>>>>
>>>> Hostname: 192.168.10.32
>>>> Uuid: 32497846-6e02-4b68-b147-6f4b936b3373
>>>> State: Peer Rejected (Connected)
>>>
>>> It's worth noting that the attempt to probe the peer was listed as
>>> successful though:
>>>
>>> gluster> peer probe mel-storage04
>>>
>>> Probe successful
>>> gluster> peer status
>>> Number of Peers: 1
>>>
>>> Hostname: mel-storage04
>>> Uuid: 6254c24d-29d4-4794-8159-3c2b03b34798
>>> State: Peer Rejected (Connected)
>>>
>>
>>
>> After searching around some more, I saw that this issue is usually caused by two peers joining, when one has a very out of date volume list.
>> And indeed, in the log files I see messages about checksums failing to agree on volumes being exchanged.
>>
>> The odd thing is, this is a fresh server, running the same version of glusterfs.
>> I tried stopping the services entirely, rm -rf /var/lib/glusterfs/*, and then started up again and tried probing that peer -- and received the same Rejection.
>> I'm confused as to how it could possibly be getting a different volume checksum, when it didn't even have its own copy.
>>
>> Does the community have any suggestions about resolving this?
>>
>> See also, inability to remove or replace bricks in separate message - which might be related, although the errors occur even if run on the cluster without this problematic peer attached at all.
>>
>> -Toby