RE: Network failure results cluster environmentunstable & fragile

"Pena, Francisco Javier" <francisco_javier.pena@xxxxxxxxx> · Mon, 27 Feb 2006 09:22:19 +0100

Hi Deval,

If you are using iLO fencing, you could try the latest fence package
(1.32.10). I have seen a similar problem, and it is because recent iLO
firmware versions behave a little different (they try to make a soft
restart instead of a hard reboot). 

At least one of the nodes should get properly killed, and the surviving
one should keep all services.

Hope this helps. Regards,

Javier

> -----Original Message-----
> From: linux-cluster-bounces@xxxxxxxxxx 
> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Deval 
> kulshrestha
> Sent: Saturday, February 25, 2006 6:33 AM
> To: 'linux clustering'
> Subject: RE:  Network failure results cluster 
> environmentunstable & fragile
> 
> 
> Please help me to resolve my problem 
> 
> 
> If network goes off on node1, and service which were not 
> running on node1  are started by node1 with shared storage 
> mount point, which was already  running on node 2 but both of 
> these nodes are not  able to communicate to  each other, 
> node2 anyway already running the same service with shared  
> storage mount point. Because of Fencing both of  these nodes 
> try to kill each other. Both of they got hanged up at 
> "Stoping Cluster manager Services.".In /var/log/messages, it 
> shows fencing s1, fence successful. 
>  
>  If we disable fencing than 
>  
>  If network comes back nodes don't synchronize with each 
> other. Shared  storage mount point is available to both the 
> servers. If they try to access  storage at same storage gives 
> IO errors. Hence this entire setup become very unstable, fragile.
> 
> --- Deval kulshrestha
> <deval.kulshrestha@xxxxxxxxxxxxxxx> wrote:
> 
> > Hi
> > 
> > I am struggling to get some help on following
> > configuration. This setup is
> > intended to put live in a data center for 24 x 7
> > x365, any issue that makes
> > my environment unstable is very critical here.
> > 
> > My HA Cluster Setup details
> > 
> > 1.	HP DL 360 G4p Server                       2nos.
> > 2.	HP MSA 500 G2 (SAN)                     1nos.
> > 3.	RedHat Enterprise Linux 4 ES  
> > 4.	Red Hat Cluster Suite 4
> > 
> > 
> > Server does have a HP SCSI HBA. MSA 500G2 is a scsi
> > based SAN. Both of these
> > server are connected to SAN using SCSI VHDCI cable.
> > I used a network switch
> > to establish network connectivity for the server.
> > created a disk array of
> > three HDD on SAN with two logical volumes than  I
> > have installed RHEL 4
> > Update 1 on both server(Servers are configured with
> > RAID 1) than installed
> > all HP drivers and management agents. After server 
> configuration and 
> > OS installation I have installed Red Hat Cluster Suite
> > v 4 on both the machine.
> > 
> >  
> > 
> > Than I have configured Cluster using Cluster
> > Configuration Manager. Added
> > member hosts, configured fence device and assigned
> > to member host(HP iLO is
> > certified as an fence device), Configured Failover
> > domain with node
> > priority, configured resources such as floating IP
> > address, File System,
> > Script, than configured service which need to be run
> > in HA mode.
> > 
> >  
> > 
> > After configuring this I have tested with various
> > scenario HA is working
> > properly, when ever powered off any machine ,
> > services fail over on
> > available node.
> > 
> > Problem:
> > 
> > 
> > If network goes off on node1, and service which were
> > not running on node1
> > are started by node1 with shared storage mount
> > point, which was already
> > running on node 2 but both of these nodes are not
> > able to communicate to
> > each other, node2 anyway already running the same
> > service with shared
> > storage mount point. Because of Fencing both of
> > these nodes try to kill each
> > other. Both of they got hanged up at "Stoping
> > Cluster manager Services.".In
> > /var/log/messages, it shows fencing s1, fence
> > successful.
> > 
> > If we disable fencing than
> > 
> > If network comes back nodes don't synchronize with
> > each other. Shared
> > storage mount point is available to both the
> > servers. If they try to access
> > storage at same storage gives IO errors. Hence this
> > entire setup become very
> > unstable, fragile.
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > With Regard
> > 
> > Deval
> > 
> > Progression Infonet Pvt. Ltd.
> > 55, Independent Electronic Modules, 
> > Sector - 18, Electronic City, 
> > Gurgaon - 122015
> > 
> > India
> > Tel          : - 0124 - 2455070, Ext. 215, Fax:
> > 91-124-2398647
> > Mobile   : - 98186 -82509 
> > URL        : - www.progression.com 
> > 
> >  
> > 
> >
> ===========================================================
> > Privileged or confidential information may be
> > contained
> > in this message. If you are not the addressee
> > indicated
> > in this message (or responsible for delivery of the
> > message to such person), please delete this message
> > and
> > kindly notify the sender by an emailed reply.
> > Opinions,
> > conclusions and other information in this message
> > that
> > do not relate to the official business of
> > Progression
> > and its associate entities shall be understood as
> > neither
> > given nor endorsed by them.
> >   
> > 
> >
> -------------------------------------------------------------
> > Progression Infonet Private Limited, Gurgaon
> > (Haryana), India
> > > --
> > 
> > Linux-cluster@xxxxxxxxxx
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> 
> 
> ===========================================================
> Privileged or confidential information may be contained
> in this message. If you are not the addressee indicated
> in this message (or responsible for delivery of the 
> message to such person), please delete this message and
> kindly notify the sender by an emailed reply. Opinions, 
> conclusions and other information in this message that do not 
> relate to the official business of Progression and its 
> associate entities shall be understood as neither given nor 
> endorsed by them.
>   
> 
> -------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon (Haryana), India
> 
> --
> 
> Linux-cluster@xxxxxxxxxx 
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> ===========================================================
> Privileged or confidential information may be contained
> in this message. If you are not the addressee indicated
> in this message (or responsible for delivery of the 
> message to such person), please delete this message and
> kindly notify the sender by an emailed reply. Opinions, 
> conclusions and other information in this message that do not 
> relate to the official business of Progression and its 
> associate entities shall be understood as neither given nor 
> endorsed by them.
>   
> 
> -------------------------------------------------------------
> Progression Infonet Private Limited, Gurgaon (Haryana), India
> 
> 
> --
> 
> Linux-cluster@xxxxxxxxxx 
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster