On Fri, Jun 03, 2005 at 09:48:12PM -0400, Rich Paredes wrote: > Assumptions: 3 node cluster. > All 3 nodes are lock managers > Nodes 1 and 2 mount GFS filesystems > Node 1 during failure is master, node 2 and node 3 are slaves > > Error on node 2 is: > lock_gulmd_LT000[3608]: Timeout (15000000) on idx: 2 fd:7 (node1:192.168.101.11) > > This error keeps repeating in the logs and GFS filesystem are totally > inaccessible. To fix, the master lock manager needs to be manually > expired and then rebooted because applications were accessing GFS > filesystems. > > It looks like error message is generated from lock_io.c. > > Does anyone know exactly what causes this error? New sockets have a sepcific time slot in which they must send a valid login packet before they are kicked out. The message you're seeing is form this. There should be a metching set of messages from node1 saying it is trying to log into node2. (the message might be supressed though. You will probably need to add the LoginLoops to the verbosity setting.) That error message should provide some clues as to why the timeouts are happening. -- Michael Conrad Tadpol Tilstra For some inexplicable reason, you just wish it would rain.
Attachment:
pgpYprhooO9YC.pgp
Description: PGP signature
-- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster