Re: [Linux-cluster] umount hung single node

David Teigland <teigland@xxxxxxxxxx> · Wed, 16 Mar 2005 10:28:17 +0800

> I upgrade to the latest cvs and I hit the same problem again.
> 
> umount is hung:
> root     24099 24093  0 Mar14 ?        00:00:02 umount /gfs_stripe5
> 
> and dlm_astd is spinning:
> 23895 root      20  -5     0    0    0 R 99.9  0.0   1479:34 dlm_astd
> 
> Any ideas?  Is there any debug info that would be useful?

Try 'cat /proc/cluster/dlm_stats' to see if any of those values are
changing over the span of a few seconds; if so it'll be helpful to
see which are changing (especially the AST numbers).
The other standard stuff might also help:

  echo <lockspace name> >> /proc/cluster/dlm_locks
  cat /proc/cluster/dlm_locks > dlm_locks.txt
  cat /proc/cluster/dlm_debug > dlm_debug.txt

I'm at a real loss for a good way to see what's happening, though.
The attached patch may at least tell us which loop it's stuck in.

-- 
Dave Teigland  <teigland@xxxxxxxxxx>
Index: ast.c
===================================================================
RCS file: /cvs/cluster/cluster/dlm-kernel/src/ast.c,v
retrieving revision 1.24
diff -u -r1.24 ast.c

--- ast.c       11 Mar 2005 08:15:59 -0000      1.24
+++ ast.c       16 Mar 2005 02:21:09 -0000
@@ -199,13 +199,21 @@
        void (*bast) (long param, int mode);
        long astparam;
        uint16_t flags = 0, found;
+       uint32_t debug, debug2 = 0;
 
        for (;;) {
+               if (++debug2 > 20000)
+                       printk("ast for stuck\n");
+               debug = 0;
                found = FALSE;
                down(&ast_queue_lock);
                list_for_each_entry(lkb, &ast_queue, lkb_astqueue) {
                        rsb = lkb->lkb_resource;
                        ls = rsb->res_ls;
+                       if (++debug > 10000)
+                               printk("ast foreach stuck lkb %x %x rsb %s\n",
+                                       lkb->lkb_id, lkb->lkb_astflags,
+                                       rsb->res_name);
 
                        /* don't deliver ast's for locks in lockspaces
                           being recovered */