Hi Digimer. Below are the information from the second node log file and configuration is on its way. Thanks Nov 11 00:12:47 qdiskd[6704]: <notice> Writing eviction notice for node 1 Nov 11 00:12:47 kernel: CMAN: removing node node1hb from the cluster : Killed by another node Nov 11 00:12:49 qdiskd[6704]: <notice> Node 1 evicted Nov 11 00:12:55 fenced[6771]: node1hb not a cluster member after 8 sec post_fail_delay Nov 11 00:12:55 fenced[6771]: fencing node "node1hb" Nov 11 00:14:00 ccsd[6603]: Attempt to close an unopened CCS descriptor (5462880). Nov 11 00:14:00 ccsd[6603]: Error while processing disconnect: Invalid request descriptor Nov 11 00:14:00 fenced[6771]: fence "node1hb" success Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Trying to acquire journal lock... Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Looking at journal... Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Acquiring the transaction lock... Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Replaying journal... Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Replayed 4 of 4 blocks Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: replays = 4, skips = 0, sames = 0 Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Journal replayed in 1s Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Done Nov 11 00:14:07 clurgmgrd[6833]: <info> Magma Event: Membership Change Nov 11 00:14:07 clurgmgrd[6833]: <info> State change: node1hb DOWN Nov 11 00:16:59 kernel: CMAN: node node1hb rejoining Nov 11 00:17:08 clurgmgrd[6833]: <info> Magma Event: Membership Change Nov 11 00:17:08 clurgmgrd[6833]: <info> State change: node1hb UP -----Original Message----- From: Digimer [mailto:lists@xxxxxxxxxx] Sent: Monday, 12 November, 2012 9:36 AM To: linux clustering Cc: Kalam, Imran Subject: Re: Cluster node1 rebooted itself It's hard to make much of a guess given that your cluster configuration is unknown. That said, it would seem that something interrupted comms. What is in the syslog of node 2 at the same time period? can you share you cluster.conf please (obfuscating only passwords)? On 11/11/2012 05:32 PM, Kalam, Imran wrote: > Hi All. > > I have 2 node GFS cluster running RHAS4 update 5 kernel 2.6.9-55.ELsmp. > On Sunday morning the node1 (master) has rebooted itself and I could > only see the following in the message log file. Has anyone experienced > the same problem? Please let me know if you need more information. Thanks > > Nov 11 00:12:47 kernel: CMAN: Being told to leave the cluster by node 2 > Nov 11 00:12:47 kernel: CMAN: we are leaving the cluster. > Nov 11 00:12:47 kernel: WARNING: dlm_emergency_shutdown > Nov 11 00:12:47 kernel: WARNING: dlm_emergency_shutdown > Nov 11 00:12:47 kernel: SM: 00000002 sm_stop: SG still joined > Nov 11 00:12:47 kernel: SM: 01000003 sm_stop: SG still joined > Nov 11 00:12:47 kernel: SM: 02000007 sm_stop: SG still joined > Nov 11 00:12:47 kernel: SM: 03000004 sm_stop: SG still joined > Nov 11 00:12:47 clurgmgrd[6872]: <warning> #67: Shutting down uncleanly > Nov 11 00:12:47 ccsd[6613]: Cluster manager shutdown. Attemping to > reconnect... > Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate. Refusing connection. > Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection > refused > Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111). > Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil. > Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request > descriptor > Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111). > Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil. > Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request > descriptor > Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-21). > Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil. > Nov 11 00:12:48 ccsd[6613]: Error while processing disconnect: Invalid > request descriptor > Nov 11 00:12:48 clurgmgrd: [6872]: <info> unmounting > /dev/mapper/vg_shared-lv00 (/opt/xxshare) > Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate. Refusing connection. > Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection > refused > Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate. Refusing connection. > Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection > refused > Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111). > Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil. > Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request > descriptor > Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111). > > > *Regards* > Imran Kalam > Technical Specialist > Post IT > Corporate Services > Australia Post > Level 2, 185 Rosslyn St. West Melbourne > Phone: (03) 9322 0382 > Fax: 9204 7303 > Mob: 0439 559 461 > > A > > > > > Australia Post is committed to providing our customers with excellent > service. If we can assist you in any way please telephone 13 13 18 or > visit our website. > > The information contained in this email communication may be > proprietary, confidential or legally professionally privileged. It is > intended exclusively for the individual or entity to which it is > addressed. You should only read, disclose, re-transmit, copy, > distribute, act in reliance on or commercialise the information if you > are authorised to do so. Australia Post does not represent, warrant or > guarantee that the integrity of this email communication has been > maintained nor that the communication is free of errors, virus or > interference. > > If you are not the addressee or intended recipient please notify us by > replying direct to the sender and then destroy any electronic or paper > copy of this message. Any views expressed in this email communication > are taken to be those of the individual sender, except where the sender > specifically attributes those views to Australia Post and is authorised > to do so. > > Please consider the environment before printing this email. > > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster