> I upgrade to the latest cvs and I hit the same problem again. > > umount is hung: > root 24099 24093 0 Mar14 ? 00:00:02 umount /gfs_stripe5 > > and dlm_astd is spinning: > 23895 root 20 -5 0 0 0 R 99.9 0.0 1479:34 dlm_astd > > Any ideas? Is there any debug info that would be useful? Try 'cat /proc/cluster/dlm_stats' to see if any of those values are changing over the span of a few seconds; if so it'll be helpful to see which are changing (especially the AST numbers). The other standard stuff might also help: echo <lockspace name> >> /proc/cluster/dlm_locks cat /proc/cluster/dlm_locks > dlm_locks.txt cat /proc/cluster/dlm_debug > dlm_debug.txt I'm at a real loss for a good way to see what's happening, though. The attached patch may at least tell us which loop it's stuck in. -- Dave Teigland <teigland@xxxxxxxxxx>
Index: ast.c =================================================================== RCS file: /cvs/cluster/cluster/dlm-kernel/src/ast.c,v retrieving revision 1.24 diff -u -r1.24 ast.c --- ast.c 11 Mar 2005 08:15:59 -0000 1.24 +++ ast.c 16 Mar 2005 02:21:09 -0000 @@ -199,13 +199,21 @@ void (*bast) (long param, int mode); long astparam; uint16_t flags = 0, found; + uint32_t debug, debug2 = 0; for (;;) { + if (++debug2 > 20000) + printk("ast for stuck\n"); + debug = 0; found = FALSE; down(&ast_queue_lock); list_for_each_entry(lkb, &ast_queue, lkb_astqueue) { rsb = lkb->lkb_resource; ls = rsb->res_ls; + if (++debug > 10000) + printk("ast foreach stuck lkb %x %x rsb %s\n", + lkb->lkb_id, lkb->lkb_astflags, + rsb->res_name); /* don't deliver ast's for locks in lockspaces being recovered */