If i strace a "gluster volume status" it hangs here: epoll_wait(3, {{EPOLLOUT, {u32=5, u64=5}}}, 257, 4294967295) = 1 getsockopt(5, SOL_SOCKET, SO_ERROR, [150710196258209792], [4]) = 0 getsockname(5, {sa_family=AF_INET, sin_port=htons(964), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 futex(0x63b7a4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x63b760, 2) = 1 futex(0x63b760, FUTEX_WAKE_PRIVATE, 1) = 1 epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN|EPOLLPRI, {u32=5, u64=5}}) = 0 epoll_wait(3, so talking to locahost on 964 All nodes do that, but with different ports. -- Matthew Nicholson Research Computing Specialist Harvard FAS Research Computing matthew_nicholson at harvard.edu On Tue, Jun 4, 2013 at 12:19 PM, Matthew Nicholson < matthew_nicholson at harvard.edu> wrote: > No, no duplicate UUIDs: > > [root at ox60-gstore01 ~]# gluster peer status |grep -i uuid | uniq -c > 1 Uuid: 055a13fe-e40a-46ff-9011-6c81832e3ba1 > 1 Uuid: e0c267e6-3dc2-4623-89f1-4516f1285c1a > 1 Uuid: e503bd2e-b2b2-49d4-ae05-45090e24acca > 1 Uuid: 974a503e-4f0f-44f2-81df-5383c28cdf20 > 1 Uuid: 5517a055-c5f5-41b7-95d2-dedf6900be21 > 1 Uuid: 13cfacc1-65a4-4151-91d5-bc7977e01654 > 1 Uuid: a5de08c0-e761-45ee-a7ad-e8c556f2540b > 1 Uuid: 428e11bc-5a80-41cb-af1d-a9023e2bc11b > 1 Uuid: 113562a1-e521-4747-ae75-477614ea28cf > 1 Uuid: 04c6c37b-743d-4f87-9bdc-3dfe1b573709 > 1 Uuid: 2225df4c-4510-457c-9958-0b6506ff25e4 > 1 Uuid: 6456206b-fe19-4b65-b7ab-0c9e7ce6221e > 1 Uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da > 1 Uuid: a327cd38-f98a-4554-ae62-97a21153f4d3 > 1 Uuid: a7d3a064-1bb4-4da0-a680-180db8150e4c > 1 Uuid: 757297b4-5648-4e31-88f4-00fc167a43e4 > 1 Uuid: 725a2567-b668-4a5f-b2c9-5c7dcc90c846 > 1 Uuid: 303f4cc4-c8ae-42c7-b8cd-eafee8f95122 > 1 Uuid: 439f3ffa-e468-4a8b-801e-e2f20062e6f0 > 1 Uuid: cdba3b89-e804-4bf1-afb9-d7c231399955 > > glusterd (as well as glusterfs and the nfs server, which seemingly never > dies if glusterd is shutdown) have all been restarted. Actually, we just > went so fas as to bounce one replica then another (reboot). > > > > -- > Matthew Nicholson > Research Computing Specialist > Harvard FAS Research Computing > matthew_nicholson at harvard.edu > > > > On Tue, Jun 4, 2013 at 10:30 AM, Vijay Bellur <vbellur at redhat.com> wrote: > >> On 06/04/2013 07:57 PM, Matthew Nicholson wrote: >> >>> So, we've got a volume that is mostly functioning fine (its up >>> accessible, etc etc). However, volume operations fail/don't return on it. >>> >>> >>> what i mean is >>> >>> gluster peer status//probe/etc : works >>> gluster volume info : works >>> gluster volume status/remove-brick/etc : sit for a long time and return >>> nothing. >>> >>> The only things coming up in logs are: >>> >>> [2013-06-04 10:21:36.398072] I [glusterd-utils.c:285:**glusterd_lock] >>> 0-glusterd: Cluster lock held by 757297b4-5648-4e31-88f4-**00fc167a43e4 >>> [2013-06-04 10:21:36.398123] I >>> [glusterd-handler.c:463:**glusterd_op_txn_begin] 0-management: Acquired >>> local lock >>> [2013-06-04 10:21:36.398424] I >>> [glusterd-handler.c:502:**glusterd_handle_cluster_lock] 0-glusterd: >>> Received LOCK from uuid: 757297b4-5648-4e31-88f4-**00fc167a43e4 >>> [2013-06-04 10:21:36.398448] E [glusterd-utils.c:277:**glusterd_lock] >>> 0-glusterd: Unable to get lock for uuid: >>> 757297b4-5648-4e31-88f4-**00fc167a43e4, lock held by: >>> 757297b4-5648-4e31-88f4-**00fc167a43e4 >>> [2013-06-04 10:21:36.398483] I >>> [glusterd-handler.c:1322:**glusterd_op_lock_send_resp] 0-glusterd: >>> Responded, ret: 0 >>> [2013-06-04 10:21:36.398498] E [glusterd-op-sm.c:4624:**glusterd_op_sm] >>> 0-glusterd: handler returned: -1 >>> >>> If you notice, the UUID holding the lock, and the uuid requesting the >>> lock, are the same. So it seems like a lock was "forgotten" about? >>> >>> any thoughts on clearing this? >>> >> >> Does gluster peer status list the same UUID more than once? >> >> If not, restarting the glusterd which is the lock owner should address it. >> >> -Vijay >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130604/8339f951/attachment.html>