held cluster lock blocking volume operations

matthew_nicholson at harvard.edu (Matthew Nicholson) · Tue, 4 Jun 2013 15:23:54 -0400

Vijay,

You nailed it. Joe in IRC suggested seeing if a node was its own peer, and
a few were.

its a long story, but, the short of it is a bad commit to our config system
ended up dropping the 3.4beta1 on a handful of these nodes, and we've been
cleaning that up for a little bit now. This didn't manifest right away
however....

--
Matthew Nicholson
Research Computing Specialist
Harvard FAS Research Computing
matthew_nicholson at harvard.edu

On Tue, Jun 4, 2013 at 2:39 PM, Vijay Bellur <vbellur at redhat.com> wrote:

> On 06/04/2013 10:29 PM, Matthew Nicholson wrote:
>
>> So it sees something is holding the lock, Rejects it,
>>
>> If i look up that uuid:
>>
>> [root at ox60-gstore10 ~]# gluster peer status |grep
>> 0edce15e-0de2-4496-a520-**58c65dbbc7da --context=3
>> Number of Peers: 20
>>
>> Hostname: ox60-gstore10
>> Uuid: 0edce15e-0de2-4496-a520-**58c65dbbc7da
>> State: Peer in Cluster (Connected)
>>
>
>
> This seems to be the case of a server being a peer of itself. This is not
> required. The following steps might be of help when performed on
> ox60-gstore10:
>
> a) Take a backup of /var/lib/glusterd.
>
> b) Stop glusterd.
>
> c) Remove the file with name 0edce15e-0de2-4496-a520-**58c65dbbc7da in
> /var/lib/glusterd/peers/.
>
> d) Restart glusterd.
>
> At this point in time, ox60-gstore10 should not be seen in the output of
> "gluster peer status" on ox60-gstore10. It should be seen in the output on
> other nodes of the cluster. If such a state is reached, all volume
> operations should proceed further.
>
> How did the setup get into such a state? Was a self probe attempted or
> /var/lib/glusterd cloned from one of its peers?
>
> -Vijay
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130604/91bff8b3/attachment.html>