Re: Timeout causing GFS filesystem inaccessibility

Michael Conrad Tadpol Tilstra <mtilstra@xxxxxxxxxx> · Mon, 6 Jun 2005 08:01:48 -0500

On Fri, Jun 03, 2005 at 09:48:12PM -0400, Rich Paredes wrote:
> Assumptions: 3 node cluster. 
> All 3 nodes are lock managers
> Nodes 1 and 2 mount GFS filesystems
> Node 1 during failure is master, node 2 and node 3 are slaves
> 
> Error on node 2 is:
> lock_gulmd_LT000[3608]: Timeout (15000000) on idx: 2 fd:7 (node1:192.168.101.11)
> 
> This error keeps repeating in the logs and GFS filesystem are totally
> inaccessible.  To fix, the master lock manager needs to be manually
> expired and then rebooted because applications were accessing GFS
> filesystems.
> 
> It looks like error message is generated from lock_io.c.
> 
> Does anyone know exactly what causes this error?

New sockets have a sepcific time slot in which they must send a valid
login packet before they are kicked out.  The message you're seeing is
form this.  There should be a metching set of messages from node1 saying
it is trying to log into node2.  (the message might be supressed though.
You will probably need to add the LoginLoops to the verbosity setting.)

That error message should provide some clues as to why the timeouts are
happening.

-- 
Michael Conrad Tadpol Tilstra
For some inexplicable reason, you just wish it would rain.
Attachment:
pgpYprhooO9YC.pgp

Description: PGP signature
--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster