The latest hang is a 3 node remove hang. I have stack traces, lockdump output from gfs_tool lockdump, and dlm_locks output from all 3 nodes. Except for lockdump output on node cl032 -- it is stuck in: gfs_tool D 00000008 0 20033 2778 (NOTLB) f70c1d90 00000086 f70c1d7c 00000008 00000001 c03d8315 00000008 00000001 d857ddc0 00001000 f70c1d8c c0180832 f689b2d8 e14890d0 00000000 c170e8c0 c170df60 00000000 000975b2 6d2af78e 000044d3 e08b6ef0 e08b7050 00000000 Call Trace: [<c03d39d4>] wait_for_completion+0xa4/0xe0 [<f8b3bd8b>] glock_wait_internal+0x3b/0x270 [gfs] [<f8b3c2f6>] gfs_glock_nq+0x86/0x130 [gfs] [<f8b3cae4>] gfs_glock_nq_init+0x34/0x50 [gfs] [<f8b56cda>] gfs_permission+0x4a/0x90 [gfs] [<c016c807>] permission+0x47/0x50 [<c016e45f>] may_open+0x5f/0x220 [<c016e6c7>] open_namei+0xa7/0x6e0 [<c015d691>] filp_open+0x41/0x70 [<c015daf6>] sys_open+0x46/0xa0 [<c010537d>] sysenter_past_esp+0x52/0x71 The problem looks like it is on cl032, but is a little different: dlm_recvd D C170DF98 0 19721 4 19722 19720 (L-TLB) c7a3dd30 00000046 eb6f1450 c170df98 0000399e c5cbd712 00000008 0000399e f5208dc0 c5d11e5d 0000399e c170df98 0000000a eb6f1450 00000000 c170e8c0 c170df60 00000000 00000971 c5d17c58 0000399e d50488b0 d5048a10 00000000 Call Trace: [<c03d409c>] rwsem_down_write_failed+0x9c/0x18e [<f8b7a28d>] .text.lock.locking+0xa6/0x1c9 [dlm] [<f8b78c00>] dlm_lock_stage2+0x60/0xd0 [dlm] [<f8b7ae7a>] process_lockqueue_reply+0x3aa/0x770 [dlm] [<f8b7c286>] process_cluster_request+0x816/0xeb0 [dlm] [<f8b80917>] midcomms_process_incoming_buffer+0x167/0x270 [dlm] [<f8b7e249>] receive_from_sock+0x189/0x2e0 [dlm] [<f8b7f3a6>] process_sockets+0x76/0xc0 [dlm] [<f8b7f616>] dlm_recvd+0x86/0xa0 [dlm] [<c013426a>] kthread+0xba/0xc0 [<c0103325>] kernel_thread_helper+0x5/0x10 There are also a bunch of 'df' processes from cron which are looping forever in the kernel. They are looping in stat_gfs_async(). So the problem is similar, a process stuck on a down_write of a res_lock. I'm assuming that is causing all the other problems. All the info is available here: http://developer.osdl.org/daniel/GFS/rm.hang.07dec2004/ I've include the dlm_debug output also, but I do not know how read the output. I'm planning rebooting with a kernel with more DEBUG options turned on (DEBUG_SLAB) to be sure that it is not accessing freed memory. Any other ideas on debugging? Daniel