Re: GFS failure

David Teigland <teigland@xxxxxxxxxx> · Thu, 15 Jun 2006 13:26:25 -0500

On Thu, Jun 15, 2006 at 07:05:39PM +0200, Anthony wrote:
> Hello,
> 
> yesterday,
> we had a full GFS system Fail,
> all partitions were unaccessible from all the 32 nodes.
> and now all the cluster is inaccessible.
> did any one had already seen this problem?
> 
> 
> GFS: Trying to join cluster "lock_gulm", "gen:ir"
> GFS: fsid=gen:ir.32: Joined cluster. Now mounting FS...
> GFS: fsid=gen:ir.32: jid=32: Trying to acquire journal lock...
> GFS: fsid=gen:ir.32: jid=32: Looking at journal...
> GFS: fsid=gen:ir.32: jid=32: Done
> 
> NETDEV WATCHDOG: jnet0: transmit timed out
> ipmi_kcs_sm: kcs hosed: Not in read state for error2
> NETDEV WATCHDOG: jnet0: transmit timed out
> ipmi_kcs_sm: kcs hosed: Not in read state for error2
> 
> GFS: fsid=gen:ir.32: fatal: filesystem consistency error
> GFS: fsid=gen:ir.32:   function = trans_go_xmote_bh
> GFS: fsid=gen:ir.32:   file = 
> /usr/src/build/626614-x86_64/BUILD/gfs-kernel-2.6.9-42/smp/src/gfs/glops.c, 
> line = 542
> GFS: fsid=gen:ir.32:   time = 1150223491
> GFS: fsid=gen:ir.32: about to withdraw from the cluster
> GFS: fsid=gen:ir.32: waiting for outstanding I/O
> GFS: fsid=gen:ir.32: telling LM to withdraw

This looks like
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=164331

which was fixed back in March and should be in the latest rpm's or source
tarball.

Dave

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster