Hi List, 2 Servers - connected with crossover my rpms: gfs2-utils-0.1.38-1.el5 gfs-utils-0.1.12-1.el5 kmod-gfs2-1.52-1.16.el5 cman-2.0.73-1.el5_1.1 my cluster.conf on both sites --------------------------------------------------------------------------------- <?xml version="1.0"?> <cluster name="cluster" config_version="2"> <cman two_node="1" expected_votes="1"> </cman> <clusternodes> <clusternode name="node1" votes="1" nodeid="1"> <fence> <method name="human"> <device name="human" nodename="node1"/> </method> </fence> </clusternode> <clusternode name="node2" votes="1" nodeid="2"> <fence> <method name="human"> <device name="human" nodename="node2"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="human" agent="fence_manual"/> </fencedevices> </cluster> --------------------------------------------------------------------------------------- my hosts on both sites 192.168.0.1 node1 192.168.0.2 node2 my mountpoints mkfs.gfs2 -p lock_dlm -t cluster:drbd -j 2 /dev/drbd0 mount -t gfs2 -o noatime,nodiratime /dev/drbd0 /test (Btw: => drbd works fine as Primary/Primary) ok, i can use /test on both sites and can write to files and so on. cman_tool nodes -------------------------------------------------------------------------------------- Node Sts Inc Joined Name 1 M 364 2008-02-26 23:20:16 node1 2 M 360 2008-02-26 23:20:16 node2 cman_tool status ------------------------------------------------------------------------------------- Version: 6.0.1 Config Version: 3 Cluster Name: cluster Cluster Id: 34996 Cluster Member: Yes Cluster Generation: 364 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Quorum: 1 Active subsystems: 6 Flags: 2node Ports Bound: 0 Node name: node2 Node ID: 2 Multicast addresses: 239.192.136.61 Node addresses: 192.168.0.2 NOW: i power node1 off ! my log on node2 shows: ----------------------------------------------------------------------------------------- ==> /var/log/messages <== Feb 26 23:27:22 node2 last message repeated 13 times ==> /var/log/kernel <== Feb 26 23:27:31 node2 kernel: tg3: eth1: Link is down. Feb 26 23:27:32 node2 kernel: tg3: eth1: Link is up at 100 Mbps, full duplex. Feb 26 23:27:32 node2 kernel: tg3: eth1: Flow control is off for TX and off for RX. Feb 26 23:27:36 node2 kernel: drbd0: PingAck did not arrive in time. Feb 26 23:27:36 node2 kernel: drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) Feb 26 23:27:36 node2 kernel: drbd0: Creating new current UUID Feb 26 23:27:36 node2 kernel: drbd0: asender terminated Feb 26 23:27:36 node2 kernel: drbd0: short read expecting header on sock: r=-512 Feb 26 23:27:36 node2 kernel: drbd0: tl_clear() Feb 26 23:27:36 node2 kernel: drbd0: Connection closed Feb 26 23:27:36 node2 kernel: drbd0: Writing meta data super block now. Feb 26 23:27:36 node2 kernel: drbd0: conn( NetworkFailure -> Unconnected ) Feb 26 23:27:36 node2 kernel: drbd0: receiver terminated Feb 26 23:27:36 node2 kernel: drbd0: receiver (re)started Feb 26 23:27:36 node2 kernel: drbd0: conn( Unconnected -> WFConnection ) ==> /var/log/messages <== Feb 26 23:27:37 node2 last message repeated 3 times Feb 26 23:27:40 node2 openais[3288]: [TOTEM] The token was lost in the OPERATIONAL state. Feb 26 23:27:40 node2 openais[3288]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes). Feb 26 23:27:40 node2 openais[3288]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes). Feb 26 23:27:40 node2 openais[3288]: [TOTEM] entering GATHER state from 2. Feb 26 23:27:42 node2 root: Process did not exit cleanly, returned 2 with signal 0 Feb 26 23:27:44 node2 openais[3288]: [TOTEM] entering GATHER state from 0. Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Creating commit token because I am the rep. Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Saving state aru 31 high seq received 31 Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Storing new sequence id for ring 170 Feb 26 23:27:44 node2 openais[3288]: [TOTEM] entering COMMIT state. Feb 26 23:27:44 node2 openais[3288]: [TOTEM] entering RECOVERY state. Feb 26 23:27:44 node2 openais[3288]: [TOTEM] position [0] member 192.168.0.2: Feb 26 23:27:44 node2 openais[3288]: [TOTEM] previous ring seq 364 rep 192.168.0.1 Feb 26 23:27:44 node2 openais[3288]: [TOTEM] aru 31 high delivered 31 received flag 1 Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Did not need to originate any messages in recovery. Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Sending initial ORF token Feb 26 23:27:44 node2 openais[3288]: [CLM ] CLM CONFIGURATION CHANGE Feb 26 23:27:44 node2 openais[3288]: [CLM ] New Configuration: Feb 26 23:27:44 node2 fenced[3307]: node1 not a cluster member after 0 sec post_fail_delay Feb 26 23:27:44 node2 openais[3288]: [CLM ] r(0) ip(192.168.0.2) Feb 26 23:27:44 node2 fenced[3307]: fencing node "node1" ==> /var/log/kernel <== Feb 26 23:27:44 node2 kernel: dlm: closing connection to node 1 ==> /var/log/messages <== Feb 26 23:27:44 node2 openais[3288]: [CLM ] Members Left: Feb 26 23:27:45 node2 openais[3288]: [CLM ] r(0) ip(192.168.0.1) Feb 26 23:27:45 node2 fence_manual: Node node1 needs to be reset before recovery can procede. Waiting for node1 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n node1) Feb 26 23:27:45 node2 openais[3288]: [CLM ] Members Joined: Feb 26 23:27:45 node2 openais[3288]: [CLM ] CLM CONFIGURATION CHANGE Feb 26 23:27:45 node2 openais[3288]: [CLM ] New Configuration: Feb 26 23:27:45 node2 openais[3288]: [CLM ] r(0) ip(192.168.0.2) Feb 26 23:27:45 node2 openais[3288]: [CLM ] Members Left: Feb 26 23:27:45 node2 openais[3288]: [CLM ] Members Joined: Feb 26 23:27:45 node2 openais[3288]: [SYNC ] This node is within the primary component and will provide service. Feb 26 23:27:45 node2 openais[3288]: [TOTEM] entering OPERATIONAL state. Feb 26 23:27:45 node2 openais[3288]: [CLM ] got nodejoin message 192.168.0.2 Feb 26 23:27:45 node2 openais[3288]: [CPG ] got joinlist message from node 2 Feb 26 23:27:47 node2 root: Process did not exit cleanly, returned 2 with signal 0 ------------------------------------------------------------------------------------------------------------- ls /test works BUT touch /test/testfile hangs .... cman_tool nodes shows ------------------------------------------------------------------------------------------------------------------ Node Sts Inc Joined Name 1 X 364 node1 2 M 360 2008-02-26 23:20:16 node2 ----------------------------------------------------------------------------------------------------------------- cman_tool status shows ----------------------------------------------------------------------------------------------------------------- Version: 6.0.1 Config Version: 3 Cluster Name: cluster Cluster Id: 34996 Cluster Member: Yes Cluster Generation: 368 Membership state: Cluster-Member Nodes: 1 Expected votes: 1 Total votes: 1 Quorum: 1 Active subsystems: 6 Flags: 2node Ports Bound: 0 Node name: node2 Node ID: 2 Multicast addresses: 239.192.136.61 Node addresses: 192.168.0.2 ------------------------------------------------------------------------------------------------------------------ my drbd is no problem state is already primary (standalone) Why can't i write to a gfs partition in the "lost Node" state ? Now: i power node1 on ! drbd is no problem -> its recovered. now i start cman and my touch will be finished .... Thanks for any ideas and help -Thomas -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster