I left some automated tests running over the weekend and ran into a umount hang. A single GFS file system was mounted on 2 nodes of a 3 node cluster. The test had just removed 2 subdirectories - one from each node. The test was then unmounting the file system from one node when the umount hung. Here's a stack trace from the hung umount (on cl030): (node cl030) umount D 00000008 0 14345 14339 (NOTLB) db259e04 00000086 db259df4 00000008 00000001 00000000 00000008 db259dc8 eda96dc0 f15d0750 c044aac0 db259000 db259de4 c01196d1 f7cf0b90 450fa673 c170df60 00000000 00049d65 44bb3183 0002dfe0 f15d0750 f15d08b0 c170df60 Call Trace: [<c03d39d4>] wait_for_completion+0xa4/0xe0 [<f8aba97e>] kcl_leave_service+0xfe/0x180 [cman] [<f8b06756>] release_lockspace+0x2d6/0x2f0 [dlm] [<f8a9010c>] release_gdlm+0x1c/0x30 [lock_dlm] [<f8a903f4>] lm_dlm_unmount+0x24/0x50 [lock_dlm] [<f881e496>] lm_unmount+0x46/0xac [lock_harness] [<f8b8089f>] gfs_put_super+0x30f/0x3c0 [gfs] [<c01654fa>] generic_shutdown_super+0x18a/0x1a0 [<c016608d>] kill_block_super+0x1d/0x40 [<c01652a1>] deactivate_super+0x81/0xa0 [<c017c6cc>] sys_umount+0x3c/0xa0 [<c017c749>] sys_oldumount+0x19/0x20 [<c010537d>] sysenter_past_esp+0x52/0x71 [root@cl030 proc]# cat /proc/cluster/services Service Name GID LID State Code Fence Domain: "default" 1 2 run - [3 1 2] DLM Lock Space: "stripefs" 222 275 run S-13,210,1 [1 3] Cat'ing /proc/cluster/services on the 2nd node (cl031) hangs. [root@cl031 root]# cat /proc/cluster/services >From the 2nd node (cl031). Here are some stack traces that might be interesting: cman_serviced D 00000008 0 3818 6 12593 665 (L-TLB) ebc23edc 00000046 ebc23ecc 00000008 00000001 00000010 00000008 00000002 f7726dc0 00000000 00000000 f5a4b230 00000000 00000010 00000010 ebc23f24 c170df60 00000000 000005a8 d42bcdab 0002e201 eb5119f0 eb511b50 ebc23f08 Call Trace: [<c03d409c>] rwsem_down_write_failed+0x9c/0x18e [<f8b06acb>] .text.lock.lockspace+0x4e/0x63 [dlm] [<f8a8daa2>] process_leave_stop+0x32/0x80 [cman] [<f8a8dcf2>] process_one_uevent+0xc2/0x100 [cman] [<f8a8e798>] process_membership+0xc8/0xca [cman] [<f8a8bf65>] serviced+0x165/0x1d0 [cman] [<c013426a>] kthread+0xba/0xc0 [<c0103325>] kernel_thread_helper+0x5/0x10 cat /proc/cluster/services stack trace: cat D 00000008 0 22151 1 13435 (NOTLB) c1f7ae90 00000086 c1f7ae7c 00000008 00000002 000000d0 00000008 c1f7ae74 eb0acdc0 00000001 00000246 00000000 e20c4670 f474f1d0 00000000 c17168c0 c1715f60 00000001 00159c05 bad07454 0003aa83 e20c4670 e20c47d0 00000000 Call Trace: [<c03d2b03>] __down+0x93/0xf0 [<c03d2c93>] __down_failed+0xb/0x14 [<f8a9053c>] .text.lock.sm_misc+0x2d/0x41 [cman] [<f8a90144>] sm_seq_next+0x34/0x50 [cman] [<c017e629>] seq_read+0x159/0x2b0 [<c015e49f>] vfs_read+0xaf/0x120 [<c015e74b>] sys_read+0x4b/0x80 [<c010537d>] sysenter_past_esp+0x52/0x71 The full stack traces are available here: http://developer.osdl.org/daniel/gfs_umount_hang/ I'm running on 2.6.9 and cvs code from Nov 9th. Any ideas? Daniel