Hi, we have the following deadlock situation: 2 node cluster consisting of node1 and node2. /usr/local is placed on a GFS filesystem mounted on both nodes. Lockmanager is dlm. We are using RHEL4u4 a strace to ls -l /usr/local/swadmin/mnx/xml ends up in lstat("/usr/local/swadmin/mnx/xml", This happens on both cluster nodes. All processes trying to access the directory /usr/local/swadmin/mnx/xml are in "Waiting for IO (D)" state. I.e. system load is at about 400 ;-) Any ideas ? a lockdump analysis with the decipher_lockstate_dump and parse_lockdump shows the following output (The whole file is too large for the mailing-list): Entries: 101939 Glocks: 60112 PIDs: 751 4 chain: lockdump.node1.dec Glock (inode[2], 1114343) gl_flags = lock[1] gl_count = 5 gl_state = shared[3] req_gh = yes req_bh = yes lvb_count = 0 object = yes new_le = no incore_le = no reclaim = no aspace = 1 ail_bufs = no Request owner = 5856 gh_state = exclusive[1] gh_flags = try[0] local_excl[5] async[6] error = 0 gh_iflags = promote[1] Waiter3 owner = 5856 gh_state = exclusive[1] gh_flags = try[0] local_excl[5] async[6] error = 0 gh_iflags = promote[1] Inode: busy lockdump.node2.dec Glock (inode[2], 1114343) gl_flags = gl_count = 2 gl_state = unlocked[0] req_gh = no req_bh = no lvb_count = 0 object = yes new_le = no incore_le = no reclaim = no aspace = 0 ail_bufs = no Inode: num = 1114343/1114343 type = regular[1] i_count = 1 i_flags = vnode = yes lockdump.node1.dec Glock (inode[2], 627732) gl_flags = dirty[5] gl_count = 379 gl_state = exclusive[1] req_gh = no req_bh = no lvb_count = 0 object = yes new_le = no incore_le = no reclaim = no aspace = 58 ail_bufs = no Holder owner = 5856 gh_state = exclusive[1] gh_flags = try[0] local_excl[5] async[6] error = 0 gh_iflags = promote[1] holder[6] first[7] Waiter2 owner = none[-1] gh_state = shared[3] gh_flags = try[0] error = 0 gh_iflags = demote[2] alloced[4] dealloc[5] Waiter3 owner = 32753 gh_state = shared[3] gh_flags = any[3] error = 0 gh_iflags = promote[1] [...loads of Waiter3 entries...] Waiter3 owner = 4566 gh_state = shared[3] gh_flags = any[3] error = 0 gh_iflags = promote[1] Inode: busy lockdump.node2.dec Glock (inode[2], 627732) gl_flags = lock[1] gl_count = 375 gl_state = unlocked[0] req_gh = yes req_bh = yes lvb_count = 0 object = yes new_le = no incore_le = no reclaim = no aspace = 0 ail_bufs = no Request owner = 20187 gh_state = shared[3] gh_flags = any[3] error = 0 gh_iflags = promote[1] Waiter3 owner = 20187 gh_state = shared[3] gh_flags = any[3] error = 0 gh_iflags = promote[1] [...loads of Waiter3 entries...] Waiter3 owner = 10460 gh_state = shared[3] gh_flags = any[3] error = 0 gh_iflags = promote[1] Inode: busy 2 requests -- Gruss / Regards, Mark Hlawatschek http://www.atix.de/ http://www.open-sharedroot.org/ ** Visit us at CeBIT 2007 in Hannover/Germany ** ** in Hall 5, Booth G48/2 (15.-21. of March) ** ** ATIX - Ges. fuer Informationstechnologie und Consulting mbH Einsteinstr. 10 - 85716 Unterschleissheim - Germany -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster