I found out that updatedb was running on both nodes at 4:02 am, right before problems were occurring. It was indexing gfs filesystem since gfs was not listed as excluded filesystem. Could this explain the errors? On 6/6/05, Michael Conrad Tadpol Tilstra <mtilstra@xxxxxxxxxx> wrote: > On Fri, Jun 03, 2005 at 09:48:12PM -0400, Rich Paredes wrote: > > Assumptions: 3 node cluster. > > All 3 nodes are lock managers > > Nodes 1 and 2 mount GFS filesystems > > Node 1 during failure is master, node 2 and node 3 are slaves > > > > Error on node 2 is: > > lock_gulmd_LT000[3608]: Timeout (15000000) on idx: 2 fd:7 (node1:192.168.101.11) > > > > This error keeps repeating in the logs and GFS filesystem are totally > > inaccessible. To fix, the master lock manager needs to be manually > > expired and then rebooted because applications were accessing GFS > > filesystems. > > > > It looks like error message is generated from lock_io.c. > > > > Does anyone know exactly what causes this error? > > New sockets have a sepcific time slot in which they must send a valid > login packet before they are kicked out. The message you're seeing is > form this. There should be a metching set of messages from node1 saying > it is trying to log into node2. (the message might be supressed though. > You will probably need to add the LoginLoops to the verbosity setting.) > > That error message should provide some clues as to why the timeouts are > happening. > > -- > Michael Conrad Tadpol Tilstra > For some inexplicable reason, you just wish it would rain. > > > -- > > Linux-cluster@xxxxxxxxxx > http://www.redhat.com/mailman/listinfo/linux-cluster > > > -- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster