RE: GFS1: node get withdrawn intermittent

Wendy Cheng <wcheng@xxxxxxxxxx> · Thu, 08 Feb 2007 13:13:11 -0500

On Thu, 2007-02-08 at 10:02 -0800, Sridharan Ramaswamy (srramasw) wrote:
> Interesting. While testing GFS with low jounrnal size and ResourceGroup
> size, I hit the same issue,

Thanks for all the good ifo. Will look into it when I'm back to office
sometime tomorrow.

-- Wendy

> 
> 
> Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: fatal: assertion "x
> <= length" failed
> Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2:   function =
> blkalloc_internal 
> Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2:   file =
> /download/gfs/cluster.cvs-rhel4/gfs-kernel/src/gfs/rgrp.c, line = 1458 
> Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2:   time = 1170896502
> Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: about to withdraw
> from the cluster
> Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: waiting for
> outstanding I/O
> Feb  7 17:01:42 cfs1 kernel: GFS: fsid=cisco:gfs2.2: telling LM to
> withdraw
> 
> 
> This happened on a 3 node GFS over 512M device.
> 
> $ gfs_mkfs -t cisco:gfs2 -p lock_dlm -j 3 -J 8 -r 16 -X /dev/hda12
> 
> I was using bonnie++ to create about 10K files of 1K each from each of 3
> nodes simulataneous.
> 
> Look at the code in rgrp.c it seems related to failure to find a
> particular resource group block. Could this be due to a very low RG size
> I'm using (16M) ??
> 
> Thanks,
> Sridharan
> 
> > -----Original Message-----
> > From: linux-cluster-bounces@xxxxxxxxxx 
> > [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of 
> > rh-cluster@xxxxxxxxxx
> > Sent: Thursday, February 08, 2007 3:35 AM
> > To: linux-cluster@xxxxxxxxxx
> > Subject:  GFS1: node get withdrawn intermittent
> > 
> > Hi,
> > 
> > since some days I do get a withdraw on 1 node of my 6 nodes 
> > gfs1 cluster.
> > Yesterday I did reboot all nodes. Now the problem has moved to another
> > node.
> > 
> > kernel messages are the same anytime:
> > 
> > GFS: fsid=epsilon:amal.1: fatal: assertion "x <= length" failed
> > GFS: fsid=epsilon:amal.1:   function = blkalloc_internal
> > GFS: fsid=epsilon:amal.1:   file =
> > /build/buildd/linux-modules-extra-2.6-2.6.17/debian/build/buil
> d_amd64_none_amd64_redhat-cluster/gfs/gfs/rgrp.c,
> > line = 1458
> > GFS: fsid=epsilon:amal.1:   time = 1170922910
> > GFS: fsid=epsilon:amal.1: about to withdraw from the cluster
> > GFS: fsid=epsilon:amal.1: waiting for outstanding I/O
> > GFS: fsid=epsilon:amal.1: telling LM to withdraw
> > lock_dlm: withdraw abandoned memory
> > GFS: fsid=epsilon:amal.1: withdrawn
> > 
> > `gfs_tool df` says:
> > /home:
> >   SB lock proto = "lock_dlm"rently  mounted GFS filesystems.  
> > Each line
> > repre-
> >   SB lock table = "epsilon:affaire"The columns represent (in 
> > order): 1)
> > A num-
> >   SB ondisk format = 1309s a cookie that represents the mounted
> > filesystem. 2)
> >   SB multihost format = 1401e device that holds the 
> > filesystem (well, the
> > name
> >   Block size = 4096he Linux kernel knows it). 3) The lock table field
> > that the
> >   Journals = 12ilesystem was mounted with.
> >   Resource Groups = 1166
> >   Mounted lock proto = "lock_dlm"rsize]
> >   Mounted lock table = "epsilon:amal"t the locks this machine holds 
> > for  a
> >   Mounted host data = ""esystem.  Buffersize  is  the  size  of the
> > buffer (in
> >   Journal number = 0 that gfs_tool allocates to store  the  
> > lock  data 
> > during
> >   Lock module flags = ng.  It defaults to 4194304 bytes.
> >   Local flocks = FALSE
> >   Local caching = FALSE
> >   Oopses OK = FALSE loads  arguments  into  the  module what will
> > override the
> >               mount options passed with the -o field on the 
> > next  mount. 
> >  See
> >   Type           Total          Used           Free           use%
> >   
> > --------------------------------------------------------------
> > ----------
> >   inodes         731726         731726         0              100%
> >   metadata       329491         4392           325099         1%cks.
> >   data           75336111       4646188        70689923       6%
> > 
> > 
> > System:
> > 6 Dual AMD Opteron
> > Kernel 2.6.17-2-amd64
> > Userland 32 Bit
> > Storage device via qlogic fibre channel qla2xxx, without 
> > serious problems
> > No LVM
> > 
> > 
> > Kind Regards,
> > 
> > menole
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster