On Thu, Feb 08, 2007 at 10:02:50AM -0800, Sridharan Ramaswamy (srramasw) wrote: > Interesting. While testing GFS with low jounrnal size and ResourceGroup > size, I hit the same issue, > > > Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: fatal: assertion "x > <= length" failed > Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: function = > blkalloc_internal > Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: file = > /download/gfs/cluster.cvs-rhel4/gfs-kernel/src/gfs/rgrp.c, line = 1458 > Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: time = 1170896502 > Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: about to withdraw > from the cluster > Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: waiting for > outstanding I/O > Feb 7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: telling LM to > withdraw > > > This happened on a 3 node GFS over 512M device. > > $ gfs_mkfs -t cisco:gfs2 -p lock_dlm -j 3 -J 8 -r 16 -X /dev/hda12 > > I was using bonnie++ to create about 10K files of 1K each from each of 3 > nodes simulataneous. > > Look at the code in rgrp.c it seems related to failure to find a > particular resource group block. Could this be due to a very low RG size > I'm using (16M) ?? This is bz 215793 which has been around for quite a while and has been very difficult for us to reproduce. Perhaps using a smaller rg size is a way to reproduce the bug more easily. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster