held cluster lock blocking volume operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If i strace a "gluster volume status" it hangs here:

epoll_wait(3, {{EPOLLOUT, {u32=5, u64=5}}}, 257, 4294967295) = 1
getsockopt(5, SOL_SOCKET, SO_ERROR, [150710196258209792], [4]) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(964),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
futex(0x63b7a4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x63b760, 2) = 1
futex(0x63b760, FUTEX_WAKE_PRIVATE, 1)  = 1
epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN|EPOLLPRI, {u32=5, u64=5}}) = 0
epoll_wait(3,

so talking to locahost on 964

All nodes do that, but with different ports.



--
Matthew Nicholson
Research Computing Specialist
Harvard FAS Research Computing
matthew_nicholson at harvard.edu



On Tue, Jun 4, 2013 at 12:19 PM, Matthew Nicholson <
matthew_nicholson at harvard.edu> wrote:

> No, no duplicate UUIDs:
>
> [root at ox60-gstore01 ~]# gluster peer status |grep -i uuid | uniq -c
>       1 Uuid: 055a13fe-e40a-46ff-9011-6c81832e3ba1
>       1 Uuid: e0c267e6-3dc2-4623-89f1-4516f1285c1a
>       1 Uuid: e503bd2e-b2b2-49d4-ae05-45090e24acca
>       1 Uuid: 974a503e-4f0f-44f2-81df-5383c28cdf20
>       1 Uuid: 5517a055-c5f5-41b7-95d2-dedf6900be21
>       1 Uuid: 13cfacc1-65a4-4151-91d5-bc7977e01654
>       1 Uuid: a5de08c0-e761-45ee-a7ad-e8c556f2540b
>       1 Uuid: 428e11bc-5a80-41cb-af1d-a9023e2bc11b
>       1 Uuid: 113562a1-e521-4747-ae75-477614ea28cf
>       1 Uuid: 04c6c37b-743d-4f87-9bdc-3dfe1b573709
>       1 Uuid: 2225df4c-4510-457c-9958-0b6506ff25e4
>       1 Uuid: 6456206b-fe19-4b65-b7ab-0c9e7ce6221e
>       1 Uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da
>       1 Uuid: a327cd38-f98a-4554-ae62-97a21153f4d3
>       1 Uuid: a7d3a064-1bb4-4da0-a680-180db8150e4c
>       1 Uuid: 757297b4-5648-4e31-88f4-00fc167a43e4
>       1 Uuid: 725a2567-b668-4a5f-b2c9-5c7dcc90c846
>       1 Uuid: 303f4cc4-c8ae-42c7-b8cd-eafee8f95122
>       1 Uuid: 439f3ffa-e468-4a8b-801e-e2f20062e6f0
>       1 Uuid: cdba3b89-e804-4bf1-afb9-d7c231399955
>
> glusterd (as well as glusterfs and the nfs server, which seemingly never
> dies if glusterd is shutdown) have all been restarted. Actually, we just
> went so fas as to bounce one replica then another (reboot).
>
>
>
> --
> Matthew Nicholson
> Research Computing Specialist
> Harvard FAS Research Computing
> matthew_nicholson at harvard.edu
>
>
>
> On Tue, Jun 4, 2013 at 10:30 AM, Vijay Bellur <vbellur at redhat.com> wrote:
>
>> On 06/04/2013 07:57 PM, Matthew Nicholson wrote:
>>
>>> So, we've got a volume that is mostly functioning fine (its up
>>> accessible, etc etc). However, volume operations fail/don't return on it.
>>>
>>>
>>> what i mean is
>>>
>>> gluster peer status//probe/etc : works
>>> gluster volume info : works
>>> gluster volume status/remove-brick/etc : sit for a long time and return
>>> nothing.
>>>
>>> The only things coming up in logs are:
>>>
>>> [2013-06-04 10:21:36.398072] I [glusterd-utils.c:285:**glusterd_lock]
>>> 0-glusterd: Cluster lock held by 757297b4-5648-4e31-88f4-**00fc167a43e4
>>> [2013-06-04 10:21:36.398123] I
>>> [glusterd-handler.c:463:**glusterd_op_txn_begin] 0-management: Acquired
>>> local lock
>>> [2013-06-04 10:21:36.398424] I
>>> [glusterd-handler.c:502:**glusterd_handle_cluster_lock] 0-glusterd:
>>> Received LOCK from uuid: 757297b4-5648-4e31-88f4-**00fc167a43e4
>>> [2013-06-04 10:21:36.398448] E [glusterd-utils.c:277:**glusterd_lock]
>>> 0-glusterd: Unable to get lock for uuid:
>>> 757297b4-5648-4e31-88f4-**00fc167a43e4, lock held by:
>>> 757297b4-5648-4e31-88f4-**00fc167a43e4
>>> [2013-06-04 10:21:36.398483] I
>>> [glusterd-handler.c:1322:**glusterd_op_lock_send_resp] 0-glusterd:
>>> Responded, ret: 0
>>> [2013-06-04 10:21:36.398498] E [glusterd-op-sm.c:4624:**glusterd_op_sm]
>>> 0-glusterd: handler returned: -1
>>>
>>> If you notice, the UUID holding the lock, and the uuid requesting the
>>> lock, are the same. So it seems like a lock was "forgotten" about?
>>>
>>> any thoughts on clearing this?
>>>
>>
>> Does gluster peer status list the same UUID more than once?
>>
>> If not, restarting the glusterd which is the lock owner should address it.
>>
>> -Vijay
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130604/8339f951/attachment.html>


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux