Re: GFS2 2 Node Cluster - lost Node - Mount not writeable

John Ruemker <jruemker@xxxxxxxxxx> · Wed, 27 Feb 2008 09:34:58 -0500

Thomas Börnert wrote:
Hi List,

2 Servers - connected with crossover

my rpms:
gfs2-utils-0.1.38-1.el5
gfs-utils-0.1.12-1.el5
kmod-gfs2-1.52-1.16.el5
cman-2.0.73-1.el5_1.1

my cluster.conf on both sites
---------------------------------------------------------------------------------
<?xml version="1.0"?>
<cluster name="cluster" config_version="2">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>

<clusternode name="node1" votes="1" nodeid="1">
         <fence>
                <method name="human">
                        <device name="human" nodename="node1"/>
                </method>
        </fence>
</clusternode>

<clusternode name="node2" votes="1" nodeid="2">
         <fence>
                <method name="human">
                        <device name="human" nodename="node2"/>
                </method>
        </fence>
</clusternode>
</clusternodes>

<fencedevices>
        <fencedevice name="human" agent="fence_manual"/>
</fencedevices>
</cluster>
---------------------------------------------------------------------------------------
my hosts on both sites
192.168.0.1	node1
192.168.0.2	node2

my mountpoints
mkfs.gfs2 -p lock_dlm -t cluster:drbd -j 2 /dev/drbd0
mount -t gfs2 -o noatime,nodiratime /dev/drbd0 /test
(Btw: => drbd works fine as Primary/Primary)

ok, i can use /test on both sites and can write to files
and so on.

cman_tool nodes
--------------------------------------------------------------------------------------
Node  Sts   Inc   Joined               Name
   1   M    364   2008-02-26 23:20:16  node1
   2   M    360   2008-02-26 23:20:16  node2

cman_tool status
-------------------------------------------------------------------------------------
Version: 6.0.1
Config Version: 3
Cluster Name: cluster
Cluster Id: 34996
Cluster Member: Yes
Cluster Generation: 364
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Quorum: 1  
Active subsystems: 6
Flags: 2node 
Ports Bound: 0  
Node name: node2
Node ID: 2
Multicast addresses: 239.192.136.61 
Node addresses: 192.168.0.2

NOW: i power node1 off !

my log on node2 shows:
-----------------------------------------------------------------------------------------
==> /var/log/messages <==
Feb 26 23:27:22 node2 last message repeated 13 times

==> /var/log/kernel <==
Feb 26 23:27:31 node2 kernel: tg3: eth1: Link is down.
Feb 26 23:27:32 node2 kernel: tg3: eth1: Link is up at 100 Mbps, full duplex.
Feb 26 23:27:32 node2 kernel: tg3: eth1: Flow control is off for TX and off 
for RX.
Feb 26 23:27:36 node2 kernel: drbd0: PingAck did not arrive in time.
Feb 26 23:27:36 node2 kernel: drbd0: peer( Primary -> Unknown ) conn( 
Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
Feb 26 23:27:36 node2 kernel: drbd0: Creating new current UUID
Feb 26 23:27:36 node2 kernel: drbd0: asender terminated
Feb 26 23:27:36 node2 kernel: drbd0: short read expecting header on sock: 
r=-512
Feb 26 23:27:36 node2 kernel: drbd0: tl_clear()
Feb 26 23:27:36 node2 kernel: drbd0: Connection closed
Feb 26 23:27:36 node2 kernel: drbd0: Writing meta data super block now.
Feb 26 23:27:36 node2 kernel: drbd0: conn( NetworkFailure -> Unconnected )
Feb 26 23:27:36 node2 kernel: drbd0: receiver terminated
Feb 26 23:27:36 node2 kernel: drbd0: receiver (re)started
Feb 26 23:27:36 node2 kernel: drbd0: conn( Unconnected -> WFConnection )

==> /var/log/messages <==
Feb 26 23:27:37 node2 last message repeated 3 times
Feb 26 23:27:40 node2 openais[3288]: [TOTEM] The token was lost in the 
OPERATIONAL state.
Feb 26 23:27:40 node2 openais[3288]: [TOTEM] Receive multicast socket recv 
buffer size (288000 bytes).
Feb 26 23:27:40 node2 openais[3288]: [TOTEM] Transmit multicast socket send 
buffer size (262142 bytes).
Feb 26 23:27:40 node2 openais[3288]: [TOTEM] entering GATHER state from 2.
Feb 26 23:27:42 node2 root: Process did not exit cleanly, returned 2 with 
signal 0
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] entering GATHER state from 0.
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Creating commit token because I 
am the rep.
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Saving state aru 31 high seq 
received 31
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Storing new sequence id for ring 
170
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] entering COMMIT state.
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] entering RECOVERY state.
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] position [0] member 192.168.0.2:
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] previous ring seq 364 rep 
192.168.0.1
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] aru 31 high delivered 31 received 
flag 1
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Did not need to originate any 
messages in recovery.
Feb 26 23:27:44 node2 openais[3288]: [TOTEM] Sending initial ORF token
Feb 26 23:27:44 node2 openais[3288]: [CLM  ] CLM CONFIGURATION CHANGE
Feb 26 23:27:44 node2 openais[3288]: [CLM  ] New Configuration:
Feb 26 23:27:44 node2 fenced[3307]: node1 not a cluster member after 0 sec 
post_fail_delay
Feb 26 23:27:44 node2 openais[3288]: [CLM  ]       r(0) ip(192.168.0.2)
Feb 26 23:27:44 node2 fenced[3307]: fencing node "node1"

==> /var/log/kernel <==
Feb 26 23:27:44 node2 kernel: dlm: closing connection to node 1

==> /var/log/messages <==
Feb 26 23:27:44 node2 openais[3288]: [CLM  ] Members Left:
Feb 26 23:27:45 node2 openais[3288]: [CLM  ]       r(0) ip(192.168.0.1)
Feb 26 23:27:45 node2 fence_manual: Node node1 needs to be reset before 
recovery can procede.  Waiting for node1 to rejoin the cluster or for manual 
acknowledgement that it has been reset (i.e. fence_ack_manual -n node1)

Note this message...

Feb 26 23:27:45 node2 openais[3288]: [CLM  ] Members Joined:
Feb 26 23:27:45 node2 openais[3288]: [CLM  ] CLM CONFIGURATION CHANGE
Feb 26 23:27:45 node2 openais[3288]: [CLM  ] New Configuration:
Feb 26 23:27:45 node2 openais[3288]: [CLM  ]       r(0) ip(192.168.0.2)
Feb 26 23:27:45 node2 openais[3288]: [CLM  ] Members Left:
Feb 26 23:27:45 node2 openais[3288]: [CLM  ] Members Joined:
Feb 26 23:27:45 node2 openais[3288]: [SYNC ] This node is within the primary 
component and will provide service.
Feb 26 23:27:45 node2 openais[3288]: [TOTEM] entering OPERATIONAL state.
Feb 26 23:27:45 node2 openais[3288]: [CLM  ] got nodejoin message 192.168.0.2
Feb 26 23:27:45 node2 openais[3288]: [CPG  ] got joinlist message from node 2
Feb 26 23:27:47 node2 root: Process did not exit cleanly, returned 2 with 
signal 0
-------------------------------------------------------------------------------------------------------------

ls /test works

BUT

touch /test/testfile hangs ....

cman_tool nodes shows
------------------------------------------------------------------------------------------------------------------
Node  Sts   Inc   Joined               Name
   1   X    364                        node1
   2   M    360   2008-02-26 23:20:16  node2
-----------------------------------------------------------------------------------------------------------------

cman_tool status shows
-----------------------------------------------------------------------------------------------------------------
Version: 6.0.1
Config Version: 3
Cluster Name: cluster
Cluster Id: 34996
Cluster Member: Yes
Cluster Generation: 368
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Quorum: 1  
Active subsystems: 6
Flags: 2node 
Ports Bound: 0  
Node name: node2
Node ID: 2
Multicast addresses: 239.192.136.61 
Node addresses: 192.168.0.2
------------------------------------------------------------------------------------------------------------------

my drbd is no problem state is already primary (standalone)

Why can't i write to a gfs partition in the "lost Node" state ?

Now: i power node1 on !

drbd is no problem -> its recovered.
now i start cman
and my touch will be finished ....

Thanks for any ideas and help

-Thomas

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

This is because you are using manual fencing.  Fencing is required to 
ensure that an errant node does not continue to write to the shared 
filesystem after it has lost communication with the cluster, thereby 
corrupting the data.  The only way to do this is to halt all cluster 
activity (including granting GFS locks) until the fencing succeeds.   
The "manual" means that an administrator must intervene and correct the 
problem before cluster operations can resume.  So when you power off 
node1, node2 detects missed heartbeats and fences node1.  Now you must 
manually fence node1 by powering it off (this is already done in your 
case) then do one of the following:

    1) Run the following command to acknowledge that you have manually 
fenced the node

              # /sbin/fence_ack_manual node1

        OR

    2) Start node1 back up and have it rejoin the cluster

The danger with manual fencing comes in when you quickly run 
fence_ack_manual without properly investigating the issue or fencing the 
node.  You may see that the fenced node is still up and quickly run that 
command without noticing that the network connection has been lost.  Now 
the nodes proceed with writing to GFS without being able to communicate 
and they quickly corrupt the data. 

So, when using manual fencing always take caution before running 
fence_ack_manual. 

John

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster