Hi, This sounds like something that someone on the openais would know. I've CC'd the openais list. -- Lon On Fri, 2008-06-27 at 16:03 +1000, Bevan Broun wrote: > Hi All > > I have a 2 node RHEL-5.1 cluster. A quorum disk is configured. > The hosts have 4 NICs. These are bonded: > (eth0+eth2) -> bond0 > (eth1+eth3) -> bond1 > Unfortunately I was not able to use a dedicated interface for cluster communications - bond1 is being used. This is where I think Im in trouble. > > The cluster has been configured using IP addressess. I did have to use http://archives.free.net.ph/message/20080130.074958.5c7a211c.en.html > as the hostname is related to the bond0 IP. > > I have not defined the interface to be used by the cluster, just relying on the IP address configured. > The cluster's purpose is 2 GFS file systems. > > The cluster was configured and working for 4 days before there was problems. > > I now have almost constant lost of token message in /var/log/message. They are almost exactly 5 minutes apart. A typical bit of messages file is show below my sig. > > Just before the problem started a samba message shows nmdb becomming local master browser for a work group on the interface used for cluster communications. > > Jun 20 13:39:27 HOST1 nmbd[24506]: [2008/06/20 13:39:27, 0] nmbd/nmbd_become_lmb.c:become_loca > l_master_stage2(396) > Jun 20 13:39:27 HOST1 nmbd[24506]: ***** > Jun 20 13:39:27 HOST1 nmbd[24506]: > Jun 20 13:39:27 HOST1 nmbd[24506]: Samba name server NBM1 is now a local master browser for > workgroup SMS_DOMAIN on subnet 162.16.96.229 > Jun 20 13:39:27 HOST1 nmbd[24506]: > Jun 20 13:39:27 HOST1 nmbd[24506]: ***** > Jun 20 13:43:27 HOST1 openais[15265]: [TOTEM] The token was lost in the OPERATIONAL state. > > "cman_tool status" shows both nodes and looks normal. Looks like clmvd is not happy, df commands are hanging. > > Could nmdb be causing this token loss? Any ideas on how to proceed? > > (names and IPs have been changed). > > Thanks > > Bevan Broun > Solutions Architect > Ardec International > http://www.ardec.com.au > http://www.lisasoft.com > http://www.terrapages.com > Sydney > ----------------------- > Suite 112,The Lower Deck > 19-21 Jones Bay Wharf > Pirrama Road, Pyrmont 2009 > Ph: +61 2 8570 5000 > Fax: +61 2 8570 5099 > > > > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] The token was lost in the OPERATIONAL state. > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Receive multicast socket recv buffer size (28800 > 0 bytes). > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Transmit multicast socket send buffer size (2621 > 42 bytes). > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] entering GATHER state from 2. > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep. > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Saving state aru 16 high seq received 16 > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce34 > Jun 20 13:48:31 HOST1 openais[15265]: [TOTEM] entering COMMIT state. > Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT state. > Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4. > Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep. > Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce38 > Jun 20 13:48:41 HOST1 openais[15265]: [TOTEM] entering COMMIT state. > Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT state. > Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4. > Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep. > Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce3c > Jun 20 13:48:51 HOST1 openais[15265]: [TOTEM] entering COMMIT state. > Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] The token was lost in the COMMIT state. > Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] entering GATHER state from 4. > Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] Creating commit token because I am the rep. > Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] Storing new sequence id for ring 20ce40 > Jun 20 13:49:01 HOST1 openais[15265]: [TOTEM] entering COMMIT state. > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] entering RECOVERY state. > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] position [0] member 162.16.96.229: > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] previous ring seq 2149936 rep 162.16.96.229 > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] aru 16 high delivered 16 received flag 1 > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] position [1] member 162.16.96.230: > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] previous ring seq 2149936 rep 162.16.96.229 > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] aru 16 high delivered 16 received flag 1 > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] Did not need to originate any messages in recove > ry. > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] Sending initial ORF token > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] CLM CONFIGURATION CHANGE > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] New Configuration: > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] r(0) ip(162.16.96.229) > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] r(0) ip(162.16.96.230) > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] Members Left: > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] Members Joined: > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] CLM CONFIGURATION CHANGE > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] New Configuration: > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] r(0) ip(162.16.96.229) > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] r(0) ip(162.16.96.230) > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] Members Left: > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] Members Joined: > Jun 20 13:49:06 HOST1 openais[15265]: [SYNC ] This node is within the primary component and wi > ll provide service. > Jun 20 13:49:06 HOST1 openais[15265]: [TOTEM] entering OPERATIONAL state. > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] got nodejoin message 162.16.96.229 > Jun 20 13:49:06 HOST1 openais[15265]: [CLM ] got nodejoin message 162.16.96.230 > Jun 20 13:49:06 HOST1 openais[15265]: [CPG ] got joinlist message from node 2 > Jun 20 13:49:06 HOST1 openais[15265]: [CPG ] got joinlist message from node 1 > Jun 20 13:53:38 HOST1 openais[15265]: [TOTEM] The token was lost in the OPERATIONAL state. > > The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence. > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster