even more info: i only see the Unable to get lock messages on the same node i'm running the gluster volume command on (status, in this instance). and, it always complains about its self. Forexample: I run: [root at ox60-gstore10 ~]# gluster volume status [root at ox60-gstore10 ~]# (it sits for a few, then just comes back empty). the logs on that system (ox60-gstore10) yeild: ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <== [2013-06-04 12:55:13.447584] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by 0edce15e-0de2-4496-a520-58c65dbbc7da [2013-06-04 12:55:13.447637] I [glusterd-handler.c:463:glusterd_op_txn_begin] 0-management: Acquired local lock [2013-06-04 12:55:13.447868] I [glusterd-handler.c:502:glusterd_handle_cluster_lock] 0-glusterd: Received LOCK from uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da [2013-06-04 12:55:13.447898] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da, lock held by: 0edce15e-0de2-4496-a520-58c65dbbc7da [2013-06-04 12:55:13.447932] I [glusterd-handler.c:1322:glusterd_op_lock_send_resp] 0-glusterd: Responded, ret: 0 [2013-06-04 12:55:13.447945] E [glusterd-op-sm.c:4624:glusterd_op_sm] 0-glusterd: handler returned: -1 [2013-06-04 12:55:13.447971] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 725a2567-b668-4a5f-b2c9-5c7dcc90c846 [2013-06-04 12:55:13.447993] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 757297b4-5648-4e31-88f4-00fc167a43e4 [2013-06-04 12:55:13.448013] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received RJT from uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da [2013-06-04 12:55:13.448035] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: a5de08c0-e761-45ee-a7ad-e8c556f2540b [2013-06-04 12:55:13.448056] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 303f4cc4-c8ae-42c7-b8cd-eafee8f95122 [2013-06-04 12:55:13.448143] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: a327cd38-f98a-4554-ae62-97a21153f4d3 [2013-06-04 12:55:13.448166] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: cdba3b89-e804-4bf1-afb9-d7c231399955 [2013-06-04 12:55:13.448191] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 055a13fe-e40a-46ff-9011-6c81832e3ba1 [2013-06-04 12:55:13.448231] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: e0c267e6-3dc2-4623-89f1-4516f1285c1a [2013-06-04 12:55:13.448257] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 6456206b-fe19-4b65-b7ab-0c9e7ce6221e [2013-06-04 12:55:13.448282] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 439f3ffa-e468-4a8b-801e-e2f20062e6f0 [2013-06-04 12:55:13.448303] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 2225df4c-4510-457c-9958-0b6506ff25e4 [2013-06-04 12:55:13.448322] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: e503bd2e-b2b2-49d4-ae05-45090e24acca [2013-06-04 12:55:13.448340] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 5517a055-c5f5-41b7-95d2-dedf6900be21 [2013-06-04 12:55:13.448358] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 974a503e-4f0f-44f2-81df-5383c28cdf20 [2013-06-04 12:55:13.448376] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 428e11bc-5a80-41cb-af1d-a9023e2bc11b So it sees something is holding the lock, Rejects it, If i look up that uuid: [root at ox60-gstore10 ~]# gluster peer status |grep 0edce15e-0de2-4496-a520-58c65dbbc7da --context=3 Number of Peers: 20 Hostname: ox60-gstore10 Uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da State: Peer in Cluster (Connected) so it itself i holding the lock it seems. If i do this on another node in the cluster, i se the same (the node I'm checking the status from is holding a lock, gets rejected, and never gets any info back). -- Matthew Nicholson Research Computing Specialist Harvard FAS Research Computing matthew_nicholson at harvard.edu On Tue, Jun 4, 2013 at 12:21 PM, Matthew Nicholson < matthew_nicholson at harvard.edu> wrote: > If i strace a "gluster volume status" it hangs here: > > epoll_wait(3, {{EPOLLOUT, {u32=5, u64=5}}}, 257, 4294967295) = 1 > getsockopt(5, SOL_SOCKET, SO_ERROR, [150710196258209792], [4]) = 0 > getsockname(5, {sa_family=AF_INET, sin_port=htons(964), > sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 > futex(0x63b7a4, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0x63b760, 2) = 1 > futex(0x63b760, FUTEX_WAKE_PRIVATE, 1) = 1 > epoll_ctl(3, EPOLL_CTL_MOD, 5, {EPOLLIN|EPOLLPRI, {u32=5, u64=5}}) = 0 > epoll_wait(3, > > so talking to locahost on 964 > > All nodes do that, but with different ports. > > > > -- > Matthew Nicholson > Research Computing Specialist > Harvard FAS Research Computing > matthew_nicholson at harvard.edu > > > > On Tue, Jun 4, 2013 at 12:19 PM, Matthew Nicholson < > matthew_nicholson at harvard.edu> wrote: > >> No, no duplicate UUIDs: >> >> [root at ox60-gstore01 ~]# gluster peer status |grep -i uuid | uniq -c >> 1 Uuid: 055a13fe-e40a-46ff-9011-6c81832e3ba1 >> 1 Uuid: e0c267e6-3dc2-4623-89f1-4516f1285c1a >> 1 Uuid: e503bd2e-b2b2-49d4-ae05-45090e24acca >> 1 Uuid: 974a503e-4f0f-44f2-81df-5383c28cdf20 >> 1 Uuid: 5517a055-c5f5-41b7-95d2-dedf6900be21 >> 1 Uuid: 13cfacc1-65a4-4151-91d5-bc7977e01654 >> 1 Uuid: a5de08c0-e761-45ee-a7ad-e8c556f2540b >> 1 Uuid: 428e11bc-5a80-41cb-af1d-a9023e2bc11b >> 1 Uuid: 113562a1-e521-4747-ae75-477614ea28cf >> 1 Uuid: 04c6c37b-743d-4f87-9bdc-3dfe1b573709 >> 1 Uuid: 2225df4c-4510-457c-9958-0b6506ff25e4 >> 1 Uuid: 6456206b-fe19-4b65-b7ab-0c9e7ce6221e >> 1 Uuid: 0edce15e-0de2-4496-a520-58c65dbbc7da >> 1 Uuid: a327cd38-f98a-4554-ae62-97a21153f4d3 >> 1 Uuid: a7d3a064-1bb4-4da0-a680-180db8150e4c >> 1 Uuid: 757297b4-5648-4e31-88f4-00fc167a43e4 >> 1 Uuid: 725a2567-b668-4a5f-b2c9-5c7dcc90c846 >> 1 Uuid: 303f4cc4-c8ae-42c7-b8cd-eafee8f95122 >> 1 Uuid: 439f3ffa-e468-4a8b-801e-e2f20062e6f0 >> 1 Uuid: cdba3b89-e804-4bf1-afb9-d7c231399955 >> >> glusterd (as well as glusterfs and the nfs server, which seemingly never >> dies if glusterd is shutdown) have all been restarted. Actually, we just >> went so fas as to bounce one replica then another (reboot). >> >> >> >> -- >> Matthew Nicholson >> Research Computing Specialist >> Harvard FAS Research Computing >> matthew_nicholson at harvard.edu >> >> >> >> On Tue, Jun 4, 2013 at 10:30 AM, Vijay Bellur <vbellur at redhat.com> wrote: >> >>> On 06/04/2013 07:57 PM, Matthew Nicholson wrote: >>> >>>> So, we've got a volume that is mostly functioning fine (its up >>>> accessible, etc etc). However, volume operations fail/don't return on >>>> it. >>>> >>>> >>>> what i mean is >>>> >>>> gluster peer status//probe/etc : works >>>> gluster volume info : works >>>> gluster volume status/remove-brick/etc : sit for a long time and return >>>> nothing. >>>> >>>> The only things coming up in logs are: >>>> >>>> [2013-06-04 10:21:36.398072] I [glusterd-utils.c:285:**glusterd_lock] >>>> 0-glusterd: Cluster lock held by 757297b4-5648-4e31-88f4-**00fc167a43e4 >>>> [2013-06-04 10:21:36.398123] I >>>> [glusterd-handler.c:463:**glusterd_op_txn_begin] 0-management: Acquired >>>> local lock >>>> [2013-06-04 10:21:36.398424] I >>>> [glusterd-handler.c:502:**glusterd_handle_cluster_lock] 0-glusterd: >>>> Received LOCK from uuid: 757297b4-5648-4e31-88f4-**00fc167a43e4 >>>> [2013-06-04 10:21:36.398448] E [glusterd-utils.c:277:**glusterd_lock] >>>> 0-glusterd: Unable to get lock for uuid: >>>> 757297b4-5648-4e31-88f4-**00fc167a43e4, lock held by: >>>> 757297b4-5648-4e31-88f4-**00fc167a43e4 >>>> [2013-06-04 10:21:36.398483] I >>>> [glusterd-handler.c:1322:**glusterd_op_lock_send_resp] 0-glusterd: >>>> Responded, ret: 0 >>>> [2013-06-04 10:21:36.398498] E [glusterd-op-sm.c:4624:**glusterd_op_sm] >>>> 0-glusterd: handler returned: -1 >>>> >>>> If you notice, the UUID holding the lock, and the uuid requesting the >>>> lock, are the same. So it seems like a lock was "forgotten" about? >>>> >>>> any thoughts on clearing this? >>>> >>> >>> Does gluster peer status list the same UUID more than once? >>> >>> If not, restarting the glusterd which is the lock owner should address >>> it. >>> >>> -Vijay >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130604/febef0ec/attachment-0001.html>