Re: gfs deadlock situation

Wendy Cheng <wcheng@xxxxxxxxxx> · Wed, 14 Feb 2007 14:41:25 -0500

Mark Hlawatschek wrote:
On Tuesday 13 February 2007 20:56, Wendy Cheng wrote:

Wendy Cheng wrote:

So it is removing a file. It has obtained the directory lock and is
waiting for the file lock. Look to me DLM (LM_CB_ASYNC) callback never
occurs. Do you have abnormal messages in your /var/log/messages file ?
Dave, how to dump the locks from DLM side to see how DLM is thinking ?

shell> cman_tool services    /* find your lock space name */
shell> echo "lock-space-name-found-above" > /proc/cluster/dlm_locks
shell> cat /proc/cluster/dlm_locks

Then try to find the lock (2, hex of (1114343)) - cut and paste the
contents of that file here.

syslog seems to be ok.
note, that the process 5856 is running on node1

What I was looking for is a lock (type=2 and lock number=0x98d3 
(=1114343)) - that's the lock hangs process id=5856. Since pid=5856 also 
holds another directory exclusive lock, so no body can access to that 
directory.

Apparently from GFS end, node 2 thinks 0x98d3 is "unlocked" and node 1 
is waiting for it. The only thing that can get node 1 out of this wait 
state is DLM's callback. If DLM doesn't have any record of this lock, 
pid=5856 will wait forever. Are you sure this is the whole file of DLM 
output ? This lock somehow disappears from DLM and I have no idea why we 
get into this state. If the files are too large, could you tar the files 
and email over ? I would like to see (both) complete glock and dlm lock 
dumps on both nodes (4 files here). If possible, add the following two 
outputs (so 6 files total):

shell> cd /tmp  /* on both nodes */
shell> script /* this should generate a file called typescript in /tmp 
directory */
shell> crash
crash> foreach bt /* keep hitting space bar until this command run thru */
crash> quit
shell> <cntl><d> /* this should close out typescript file */
shell> mv typescript nodex_crash /* x=1, 2 based on node1 or node2 */

Tar these 6 files (glock_dump_1, glock_dump_2, dlm_dump_1, dlm_dump_2, 
node1_crash, node2_crash) and email them to wcheng@xxxxxxxxxx

Thank you for the helps if you can.

-- Wendy

Here's the dlm output:

node1:
Resource 0000010001218088 (parent 0000000000000000). Name (len=24) "       2          
1100e7"
Local Copy, Master is node 2
Granted Queue
Conversion Queue
Waiting Queue
5eb00178 PR (EX) Master:     3eeb0117  LQ: 0,0x5
[...]
Resource 00000100f56f0618 (parent 0000000000000000). Name (len=24) "       5          
1100e7"
Local Copy, Master is node 2
Granted Queue
5bc20257 PR Master:     3d9703e0
Conversion Queue
Waiting Queue

node2:
Resource 00000107e462c8c8 (parent 0000000000000000). Name (len=24) "       2          
1100e7"
Master Copy
Granted Queue
3eeb0117 PR Remote:   1 5eb00178
Conversion Queue
Waiting Queue
[...]
Resource 000001079f7e81d8 (parent 0000000000000000). Name (len=24) "       5
      1100e7"
Master Copy
Granted Queue
3d9703e0 PR Remote:   1 5bc20257
3e500091 PR
Conversion Queue
Waiting Queue

Thanks for your help,

Mark

we have the following deadlock situation:

2 node cluster consisting of node1 and node2.
/usr/local is placed on a GFS filesystem mounted on both nodes.
Lockmanager is dlm.
We are using RHEL4u4

a strace to ls -l /usr/local/swadmin/mnx/xml ends up in
lstat("/usr/local/swadmin/mnx/xml",

This happens on both cluster nodes.

All processes trying to access the directory
/usr/local/swadmin/mnx/xml
are in "Waiting for IO (D)" state. I.e. system load is at about 400
;-)

Any ideas ?

Quickly browsing this, look to me that process with pid=5856 got stuck.
That process had the file or directory (ino number 627732 - probably
/usr/local/swadmin/mnx/xml) exclusive lock so everyone was waiting for
it. The faulty process was apparently in the middle of obtaining
another
exclusive lock (and almost got it). We need to know where pid=5856 was
stuck at that time. If this occurs again, could you use "crash" to back
trace that process and show us the output ?

Here's the crash output:

crash> bt 5856
PID: 5856   TASK: 10bd26427f0       CPU: 0   COMMAND: "java"
 #0 [10bd20cfbc8] schedule at ffffffff8030a1d1
 #1 [10bd20cfca0] wait_for_completion at ffffffff8030a415
 #2 [10bd20cfd20] glock_wait_internal at ffffffffa018574e
 #3 [10bd20cfd60] gfs_glock_nq_m at ffffffffa01860ce
 #4 [10bd20cfda0] gfs_unlink at ffffffffa019ce41
 #5 [10bd20cfea0] vfs_unlink at ffffffff801889fa
 #6 [10bd20cfed0] sys_unlink at ffffffff80188b19
 #7 [10bd20cff30] filp_close at ffffffff80178e48
 #8 [10bd20cff50] error_exit at ffffffff80110d91
    RIP: 0000002a9593f649  RSP: 0000007fbfffbca0  RFLAGS: 00010206
    RAX: 0000000000000057  RBX: ffffffff8011026a  RCX: 0000002a9cc9c870
    RDX: 0000002ae5989000  RSI: 0000002a962fa3a8  RDI: 0000002ae5989000
    RBP: 0000000000000000   R8: 0000002a9630abb0   R9: 0000000000000ffc
    R10: 0000002a9630abc0  R11: 0000000000000206  R12: 0000000040115700
    R13: 0000002ae23294b0  R14: 0000007fbfffc300  R15: 0000002ae5989000
    ORIG_RAX: 0000000000000057  CS: 0033  SS: 002b

a lockdump analysis with the decipher_lockstate_dump and
parse_lockdump
shows the following output (The whole file is too large for the
mailing-list):

Entries:  101939
Glocks:  60112
PIDs:  751

4 chain:
lockdump.node1.dec Glock (inode[2], 1114343)
  gl_flags = lock[1]
  gl_count = 5
  gl_state = shared[3]
  req_gh = yes
  req_bh = yes
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 1
  ail_bufs = no
  Request
    owner = 5856
    gh_state = exclusive[1]
    gh_flags = try[0] local_excl[5] async[6]
    error = 0
    gh_iflags = promote[1]
  Waiter3
    owner = 5856
    gh_state = exclusive[1]
    gh_flags = try[0] local_excl[5] async[6]
    error = 0
    gh_iflags = promote[1]
  Inode: busy
lockdump.node2.dec Glock (inode[2], 1114343)
  gl_flags =
  gl_count = 2
  gl_state = unlocked[0]
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 0
  ail_bufs = no
  Inode:
    num = 1114343/1114343
    type = regular[1]
    i_count = 1
    i_flags =
    vnode = yes
lockdump.node1.dec Glock (inode[2], 627732)
  gl_flags = dirty[5]
  gl_count = 379
  gl_state = exclusive[1]
  req_gh = no
  req_bh = no
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 58
  ail_bufs = no
  Holder
    owner = 5856
    gh_state = exclusive[1]
    gh_flags = try[0] local_excl[5] async[6]
    error = 0
    gh_iflags = promote[1] holder[6] first[7]
  Waiter2
    owner = none[-1]
    gh_state = shared[3]
    gh_flags = try[0]
    error = 0
    gh_iflags = demote[2] alloced[4] dealloc[5]
  Waiter3
    owner = 32753
    gh_state = shared[3]
    gh_flags = any[3]
    error = 0
    gh_iflags = promote[1]
  [...loads of Waiter3 entries...]
  Waiter3
    owner = 4566
    gh_state = shared[3]
    gh_flags = any[3]
    error = 0
    gh_iflags = promote[1]
  Inode: busy
lockdump.node2.dec Glock (inode[2], 627732)
  gl_flags = lock[1]
  gl_count = 375
  gl_state = unlocked[0]
  req_gh = yes
  req_bh = yes
  lvb_count = 0
  object = yes
  new_le = no
  incore_le = no
  reclaim = no
  aspace = 0
  ail_bufs = no
  Request
    owner = 20187
    gh_state = shared[3]
    gh_flags = any[3]
    error = 0
    gh_iflags = promote[1]
  Waiter3
    owner = 20187
    gh_state = shared[3]
    gh_flags = any[3]
    error = 0
    gh_iflags = promote[1]
  [...loads of Waiter3 entries...]
  Waiter3
    owner = 10460
    gh_state = shared[3]
    gh_flags = any[3]
    error = 0
    gh_iflags = promote[1]
  Inode: busy
2 requests

--

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster