Hi I still don't know what caused it, wether the failure of one node in gluster that lost one SATA controller and was rebooted or some user activity, but gluster became quite unusable. Even basic gluster commands like gluster volume heal home0 info etc didn't work either hanging or giving operation failed results. I finally managed to stop the volume after numerous attempts and restarted gluster on all nodes. However I don't seem to be able to do anything useful still. Most commands fail and the log shows: ==> etc-glusterfs-glusterd.vol.log <== [2012-12-13 14:59:49.713103] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume home0 [2012-12-13 14:59:49.713194] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: c3ce6b9c-6297-4e77-924c-b44e2c13e58f, lock held by: c3ce6b9c-6297-4e77-924c-b44e2c13e58f [2012-12-13 14:59:49.713234] E [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 I've googled and seen people hit with this at times, but never resolutions. Is there some way to clear this lock? It's been in effect for well over an hour so one of the googled results that claimed there's a generic lock timeout of 30 minutes seems not to be at work here. Any help would be appreciated. [root at se1 home0]# gluster volume info Volume Name: home0 Type: Distributed-Replicate Volume ID: 8e594854-16e1-445e-8434-1d597cef1749 Status: Started Number of Bricks: 4 x 3 = 12 Transport-type: tcp Bricks: Brick1: 192.168.1.241:/d35 Brick2: 192.168.1.242:/d35 Brick3: 192.168.1.243:/d35 Brick4: 192.168.1.244:/d35 Brick5: 192.168.1.245:/d35 Brick6: 192.168.1.240:/d35 Brick7: 192.168.1.241:/d36 Brick8: 192.168.1.242:/d36 Brick9: 192.168.1.243:/d36 Brick10: 192.168.1.244:/d36 Brick11: 192.168.1.245:/d36 Brick12: 192.168.1.240:/d36 Options Reconfigured: cluster.quorum-type: auto cluster.lookup-unhashed: off performance.client-io-threads: on cluster.data-self-heal: on performance.stat-prefetch [root at se1 home0]# gluster volume status Status of volume: home0 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 192.168.1.241:/d35 24009 Y 7137 Brick 192.168.1.242:/d35 24009 Y 6804 Brick 192.168.1.243:/d35 24009 Y 5763 Brick 192.168.1.244:/d35 24009 Y 10378 Brick 192.168.1.245:/d35 24009 Y 3770 Brick 192.168.1.240:/d35 24009 Y 21112 Brick 192.168.1.241:/d36 24010 Y 7143 Brick 192.168.1.242:/d36 24010 Y 6810 Brick 192.168.1.243:/d36 24010 Y 5771 Brick 192.168.1.244:/d36 24010 Y 10384 Brick 192.168.1.245:/d36 24010 Y 3781 Brick 192.168.1.240:/d36 24010 Y 21120 NFS Server on localhost 38467 Y 13552 Self-heal Daemon on localhost N/A Y 13792 NFS Server on 192.168.1.242 38467 Y 21254 Self-heal Daemon on 192.168.1.242 N/A Y 21267 NFS Server on 192.168.1.243 38467 Y 8865 Self-heal Daemon on 192.168.1.243 N/A Y 8871 NFS Server on 192.168.1.240 38467 Y 18806 Self-heal Daemon on 192.168.1.240 N/A Y 19045 NFS Server on 192.168.1.244 38467 Y 536 Self-heal Daemon on 192.168.1.244 N/A Y 745 NFS Server on 192.168.1.245 38467 Y 8689 Self-heal Daemon on 192.168.1.245 N/A Y 8955 [root at se1 home0]# [root at se1 home0]# gluster volume heal home0 info ==> cli.log <== [2012-12-13 15:09:33.476616] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket" ==> etc-glusterfs-glusterd.vol.log <== [2012-12-13 15:09:33.565022] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume home0 [2012-12-13 15:09:33.565122] I [glusterd-utils.c:285:glusterd_lock] 0-glusterd: Cluster lock held by c3ce6b9c-6297-4e77-924c-b44e2c13e58f [2012-12-13 15:09:33.565136] I [glusterd-handler.c:463:glusterd_op_txn_begin] 0-management: Acquired local lock [2012-12-13 15:09:33.565938] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 663ecbfb-4209-417e-a955-6c9f72751dbc [2012-12-13 15:09:33.565999] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: f1a89ed2-a2f5-49a9-9482-1c6984c37945 [2012-12-13 15:09:33.566024] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: b1ce84be-de0b-4ae1-a1e8-758d828b8872 [2012-12-13 15:09:33.566047] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: 0f61d484-0f93-4144-b166-2145f4ea4427 [2012-12-13 15:09:33.566069] I [glusterd-rpc-ops.c:548:glusterd3_1_cluster_lock_cbk] 0-glusterd: Received ACC from uuid: d9b48655-4b25-4ad2-be19-c5ec8768a789 [2012-12-13 15:09:33.566224] I [glusterd-op-sm.c:2039:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 5 peers [2012-12-13 15:09:33.566420] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: b1ce84be-de0b-4ae1-a1e8-758d828b8872 [2012-12-13 15:09:33.566450] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: d9b48655-4b25-4ad2-be19-c5ec8768a789 [2012-12-13 15:09:33.566499] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: f1a89ed2-a2f5-49a9-9482-1c6984c37945 [2012-12-13 15:09:33.566524] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: 0f61d484-0f93-4144-b166-2145f4ea4427 [2012-12-13 15:09:33.566667] I [glusterd-rpc-ops.c:881:glusterd3_1_stage_op_cbk] 0-glusterd: Received ACC from uuid: 663ecbfb-4209-417e-a955-6c9f72751dbc <hangs here> ctrl+C [root at se1 home0]# gluster volume heal home0 operation failed [root at se1 home0]# ==> cli.log <== [2012-12-13 15:10:00.686308] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to "socket" [2012-12-13 15:10:00.842108] I [cli-rpc-ops.c:5928:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume [2012-12-13 15:10:00.842187] I [input.c:46:cli_batch] 0-: Exiting with: -1 ==> etc-glusterfs-glusterd.vol.log <== [2012-12-13 15:10:00.841789] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume home0 [2012-12-13 15:10:00.841910] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: c3ce6b9c-6297-4e77-924c-b44e2c13e58f, lock held by: c3ce6b9c-6297-4e77-924c-b44e2c13e58f [2012-12-13 15:10:00.841926] E [glusterd-handler.c:458:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 Mario Kadastik, PhD Researcher --- "Physics is like sex, sure it may have practical reasons, but that's not why we do it" -- Richard P. Feynman