Network failure results cluster environment unstable & fragile

Deval kulshrestha <deval.kulshrestha@xxxxxxxxxxxxxxx> · Fri, 24 Feb 2006 10:25:29 +0530

Hi 

I am struggling to get some help on following configuration.
This setup is intended to put live in a data center for 24 x 7 x365, any issue
that makes my environment unstable is very critical here.

My HA Cluster Setup details

 HP DL 360 G4p
     Server                      
     2nos.
 HP MSA 500 G2
     (SAN)                    
     1nos.
 RedHat Enterprise Linux 4 ES  
 Red Hat Cluster Suite 4

Server does have a HP SCSI HBA. MSA 500G2 is a scsi
based SAN. Both of these server are connected to SAN using SCSI VHDCI cable. I
used a network switch  to establish network connectivity for the server. created
a disk array of three HDD on SAN with two logical volumes than  I have
installed RHEL 4 Update 1 on both server(Servers are configured with RAID 1)
than installed all HP drivers and management agents. After server configuration
and OS installation I have installed Red Hat Cluster Suite v 4 on both the
machine.

Than I have configured Cluster using Cluster
Configuration Manager. Added member hosts, configured fence device and assigned
to member host(HP iLO is certified as an fence device), Configured Failover
domain with node priority, configured resources such as floating IP address,
File System, Script, than configured service which need to be run in HA mode.

After configuring this I have tested with various
scenario HA is working properly, when ever powered off any machine , services
fail over on available node. 

Problem:

If network goes off on node1, and service which were
not running on node1 are started by node1 with shared storage mount point,
which was already running on node 2 but both of these nodes are not able to
communicate to each other, node2 anyway already running the same service with
shared storage mount point. Because of Fencing both of these nodes try to kill
each other. Both of they got hanged up at “Stoping Cluster manager
Services…”.In /var/log/messages, it shows fencing s1, fence
successful. 

If we disable fencing than 

If network comes back nodes don’t synchronize
with each other. Shared storage mount point is available to both the servers. If
they try to access storage at same storage gives IO errors. Hence this entire
setup become very unstable, fragile.

With Regard

Deval

Progression Infonet Pvt. Ltd.

55, Independent
Electronic Modules, 

Sector - 18,
Electronic City, 

Gurgaon –
122015

India

Tel         
: - 0124 - 2455070, Ext. 215, Fax: 91-124-2398647

  Mobile   : - 98186 -82509

URL       
: - www.progression.com 

===========================================================
Privileged or confidential information may be contained
in this message. If you are not the addressee indicated
in this message (or responsible for delivery of the 
message to such person), please delete this message and
kindly notify the sender by an emailed reply. Opinions,
conclusions and other information in this message that
do not relate to the official business of Progression
and its associate entities shall be understood as neither
given nor endorsed by them.

-------------------------------------------------------------
Progression Infonet Private Limited, Gurgaon (Haryana), India

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster