On Fri, Jun 16, 2006 at 10:38:58PM +0800, ?????? wrote: > Hi,all > > I run the latest STABLE cluster code with 3 nodes, > I get the message on one node after about 38 hours as: > <-- > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: fatal: assertion "FALSE" failed > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: function = xmote_bh > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: file = /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/gfs/glock. > c, line = 1093 > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: time = 1150408904 > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: about to withdraw from the cluster > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: waiting for outstanding I/O > Jun 16 06:01:44 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: telling LM to withdraw > Jun 16 06:01:48 nd04 kernel: lock_dlm: withdraw abandoned memory > Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: withdrawn > Jun 16 06:01:48 nd04 kernel: GFS: fsid=IPTV:gfs-dm2.1: ret = 0x00000002 > --> > My test program has 'df', 'write', 'ls' and 'read'. > and each node connect to RAID controller's host port directly with FC. Hi, I've attached a small patch to print more information and call BUG instead of withdrawing. It may also be helpful to see a dlm lock dump and a gfs_tool lockdump on the machine after you hit the BUG. Thanks, Dave
--- ./glock.c.orig 2006-06-16 11:17:48.313980418 -0500 +++ ./glock.c 2006-06-16 11:31:20.617855661 -0500 @@ -30,6 +30,9 @@ #include "quota.h" #include "recovery.h" +int dump_glock(struct gfs_glock *gl, char *buf, unsigned int size, + unsigned int *count) + /* Must be kept in sync with the beginning of struct gfs_glock */ struct glock_plug { struct list_head gl_list; @@ -1090,9 +1093,15 @@ spin_unlock(&gl->gl_spin); } else { - if (gfs_assert_withdraw(sdp, FALSE) == -1) - printk("GFS: fsid=%s: ret = 0x%.8X\n", - sdp->sd_fsname, ret); + char *buf; + int junk; + printk("GFS: fsid=%s: ret = 0x%.8X prev_state = %d\n", + sdp->sd_fsname, ret, prev_state); + buf = kmalloc(4096); + memset(buf, 0, sizeof(buf)); + dump_glock(gl, buf, 4096, &junk); + printk("%s\n", buf); + BUG(); } if (glops->go_xmote_bh)
-- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster