Re: Timeout causing GFS filesystem inaccessibility

Rich Paredes <rparedes@xxxxxxxxx> · Wed, 8 Jun 2005 22:42:56 -0400

I found out that updatedb was running on both nodes at 4:02 am, right
before problems were occurring.  It was indexing gfs filesystem since
gfs was not listed as excluded filesystem.  Could this explain the
errors?

On 6/6/05, Michael Conrad Tadpol Tilstra <mtilstra@xxxxxxxxxx> wrote:
> On Fri, Jun 03, 2005 at 09:48:12PM -0400, Rich Paredes wrote:
> > Assumptions: 3 node cluster.
> > All 3 nodes are lock managers
> > Nodes 1 and 2 mount GFS filesystems
> > Node 1 during failure is master, node 2 and node 3 are slaves
> >
> > Error on node 2 is:
> > lock_gulmd_LT000[3608]: Timeout (15000000) on idx: 2 fd:7 (node1:192.168.101.11)
> >
> > This error keeps repeating in the logs and GFS filesystem are totally
> > inaccessible.  To fix, the master lock manager needs to be manually
> > expired and then rebooted because applications were accessing GFS
> > filesystems.
> >
> > It looks like error message is generated from lock_io.c.
> >
> > Does anyone know exactly what causes this error?
> 
> New sockets have a sepcific time slot in which they must send a valid
> login packet before they are kicked out.  The message you're seeing is
> form this.  There should be a metching set of messages from node1 saying
> it is trying to log into node2.  (the message might be supressed though.
> You will probably need to add the LoginLoops to the verbosity setting.)
> 
> That error message should provide some clues as to why the timeouts are
> happening.
> 
> --
> Michael Conrad Tadpol Tilstra
> For some inexplicable reason, you just wish it would rain.
> 
> 
> --
> 
> Linux-cluster@xxxxxxxxxx
> http://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
>

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster