Help - upgrading & rebooting one node makes another node fenced

BJ <l_x2828@xxxxxxxxx> · Thu, 21 Feb 2008 08:27:28 -0800 (PST)

Hi,

It would be appreciated if some on could help me on
this.

We have a cluster with three nodes:

NODE1: GFS-6.0.2-25
NODE2: GFS-6.0.2.36-1
NODE3: GFS-6.0.2-25

All three nodes connect to a FC switch which is used
as the fencing device.  Before we upgrade NODE3,
"gulm_tool nodelist localhost" shows all 3 nodes are
in the cluster and NODE1 is master.

After upgrade NODE3 from GFS-6.0.2-25 to
GFS-6.0.2.36-1 and reboot it, NODE2 missed three
heartbeats and then got fenced (see kernel log below)
and the FC port connected to NODE2 was disabled by
fencing (LED on that port was changed from green to
yellow).

Could some one please tell me why this could happen
and how to avoid it, or what do we need to check for
troubleshooting this issue?  I'm not very familiar
with GFS yet.

Regards,
BJ

Feb 19 18:45:16 NODE1 lock_gulmd_core[2617]: New
Client: idx:4 fd:9 from (192.168.20.111:NODE3) 
Feb 19 18:45:17 NODE1 lock_gulmd_LT000[2618]: Attached
slave NODE3:192.168.20.111 idx:5 fd:10 (soff:2
connected:0xc) 
Feb 19 18:45:18 NODE1 lock_gulmd_LT000[2618]: New
Client: idx 6 fd 11 from (192.168.20.111:NODE3) 
Feb 19 18:52:46 NODE1 kernel: scsi(0): RSCN database
changed -0x3,0x200.
Feb 19 18:52:46 NODE1 kernel: scsi(0): Waiting for LIP
to complete...
Feb 19 18:52:46 NODE1 kernel: scsi(0): Topology -
(F_Port), Host Loop address 0xffff
Feb 19 18:52:50 NODE1 lock_gulmd_core[2617]: NODE2
missed a heartbeat (time:1203468770103133 mb:1) 
Feb 19 18:53:05 NODE1 lock_gulmd_core[2617]: NODE2
missed a heartbeat (time:1203468785123133 mb:2) 
Feb 19 18:53:20 NODE1 lock_gulmd_core[2617]: NODE2
missed a heartbeat (time:1203468800143149 mb:3) 
Feb 19 18:53:20 NODE1 lock_gulmd_core[2617]: Client
(NODE2) expired 
Feb 19 18:53:20 NODE1 lock_gulmd_core[4968]: Gonna
exec fence_node NODE2 
Feb 19 18:53:20 NODE1 lock_gulmd_core[2617]: Forked
[4968] fence_node NODE2 with a 0 pause. 
Feb 19 18:53:20 NODE1 fence_node[4968]: Performing
fence method, Brocade, on NODE2. 
Feb 19 18:53:21 NODE1 fence_node[4968]: The agent
(fence_brocade) reports: success: portdisable 2  
Feb 19 18:53:21 NODE1 fence_node[4968]: Fence of
"NODE2" was successful. 
Feb 19 18:53:21 NODE1 lock_gulmd_core[2617]: found
match on pid 4968, marking node NODE2 as logged out. 
Feb 19 18:53:21 NODE1 kernel: lock_gulm: Checking for
journals for node "NODE2"
Feb 19 18:53:21 NODE1 kernel: GFS:
fsid=MY-CLUSTER:pool_gfs.0: jid=1: Trying to acquire
journal lock...
Feb 19 18:53:21 NODE1 kernel: GFS:
fsid=MY-CLUSTER:pool_gfs.0: jid=1: Looking at
journal...
Feb 19 18:53:21 NODE1 kernel: GFS:
fsid=MY-CLUSTER:pool_gfs.0: jid=1: Done

      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster