Ya, certainly looks like a network problem. If you have a support contract with Red Hat, you may want to bring them in to have a more detailed review though. I am only guessing based on what you've listed here. Cheers On 11/11/2012 05:48 PM, Kalam, Imran wrote: > Hi Digimer. > > Below are the information from the second node log file and configuration is on its way. Thanks > > Nov 11 00:12:47 qdiskd[6704]: <notice> Writing eviction notice for node 1 > Nov 11 00:12:47 kernel: CMAN: removing node node1hb from the cluster : Killed by another node > Nov 11 00:12:49 qdiskd[6704]: <notice> Node 1 evicted > Nov 11 00:12:55 fenced[6771]: node1hb not a cluster member after 8 sec post_fail_delay > Nov 11 00:12:55 fenced[6771]: fencing node "node1hb" > Nov 11 00:14:00 ccsd[6603]: Attempt to close an unopened CCS descriptor (5462880). > Nov 11 00:14:00 ccsd[6603]: Error while processing disconnect: Invalid request descriptor > Nov 11 00:14:00 fenced[6771]: fence "node1hb" success > Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Trying to acquire journal lock... > Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Looking at journal... > Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Acquiring the transaction lock... > Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Replaying journal... > Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Replayed 4 of 4 blocks > Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: replays = 4, skips = 0, sames = 0 > Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Journal replayed in 1s > Nov 11 00:14:07 kernel: GFS: fsid=EMS_cluster1:opt-xxxshare.1: jid=0: Done > Nov 11 00:14:07 clurgmgrd[6833]: <info> Magma Event: Membership Change > Nov 11 00:14:07 clurgmgrd[6833]: <info> State change: node1hb DOWN > Nov 11 00:16:59 kernel: CMAN: node node1hb rejoining > Nov 11 00:17:08 clurgmgrd[6833]: <info> Magma Event: Membership Change > Nov 11 00:17:08 clurgmgrd[6833]: <info> State change: node1hb UP > > -----Original Message----- > From: Digimer [mailto:lists@xxxxxxxxxx] > Sent: Monday, 12 November, 2012 9:36 AM > To: linux clustering > Cc: Kalam, Imran > Subject: Re: Cluster node1 rebooted itself > > It's hard to make much of a guess given that your cluster configuration > is unknown. That said, it would seem that something interrupted comms. > What is in the syslog of node 2 at the same time period? can you share > you cluster.conf please (obfuscating only passwords)? > > On 11/11/2012 05:32 PM, Kalam, Imran wrote: >> Hi All. >> >> I have 2 node GFS cluster running RHAS4 update 5 kernel 2.6.9-55.ELsmp. >> On Sunday morning the node1 (master) has rebooted itself and I could >> only see the following in the message log file. Has anyone experienced >> the same problem? Please let me know if you need more information. Thanks >> >> Nov 11 00:12:47 kernel: CMAN: Being told to leave the cluster by node 2 >> Nov 11 00:12:47 kernel: CMAN: we are leaving the cluster. >> Nov 11 00:12:47 kernel: WARNING: dlm_emergency_shutdown >> Nov 11 00:12:47 kernel: WARNING: dlm_emergency_shutdown >> Nov 11 00:12:47 kernel: SM: 00000002 sm_stop: SG still joined >> Nov 11 00:12:47 kernel: SM: 01000003 sm_stop: SG still joined >> Nov 11 00:12:47 kernel: SM: 02000007 sm_stop: SG still joined >> Nov 11 00:12:47 kernel: SM: 03000004 sm_stop: SG still joined >> Nov 11 00:12:47 clurgmgrd[6872]: <warning> #67: Shutting down uncleanly >> Nov 11 00:12:47 ccsd[6613]: Cluster manager shutdown. Attemping to >> reconnect... >> Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate. Refusing connection. >> Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection >> refused >> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111). >> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil. >> Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request >> descriptor >> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111). >> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil. >> Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request >> descriptor >> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-21). >> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil. >> Nov 11 00:12:48 ccsd[6613]: Error while processing disconnect: Invalid >> request descriptor >> Nov 11 00:12:48 clurgmgrd: [6872]: <info> unmounting >> /dev/mapper/vg_shared-lv00 (/opt/xxshare) >> Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate. Refusing connection. >> Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection >> refused >> Nov 11 00:12:48 ccsd[6613]: Cluster is not quorate. Refusing connection. >> Nov 11 00:12:48 ccsd[6613]: Error while processing connect: Connection >> refused >> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111). >> Nov 11 00:12:48 ccsd[6613]: Someone may be attempting something evil. >> Nov 11 00:12:48 ccsd[6613]: Error while processing get: Invalid request >> descriptor >> Nov 11 00:12:48 ccsd[6613]: Invalid descriptor specified (-111). >> >> >> *Regards* >> Imran Kalam >> Technical Specialist >> Post IT >> Corporate Services >> Australia Post >> Level 2, 185 Rosslyn St. West Melbourne >> Phone: (03) 9322 0382 >> Fax: 9204 7303 >> Mob: 0439 559 461 >> >> A >> >> >> >> >> Australia Post is committed to providing our customers with excellent >> service. If we can assist you in any way please telephone 13 13 18 or >> visit our website. >> >> The information contained in this email communication may be >> proprietary, confidential or legally professionally privileged. It is >> intended exclusively for the individual or entity to which it is >> addressed. You should only read, disclose, re-transmit, copy, >> distribute, act in reliance on or commercialise the information if you >> are authorised to do so. Australia Post does not represent, warrant or >> guarantee that the integrity of this email communication has been >> maintained nor that the communication is free of errors, virus or >> interference. >> >> If you are not the addressee or intended recipient please notify us by >> replying direct to the sender and then destroy any electronic or paper >> copy of this message. Any views expressed in this email communication >> are taken to be those of the individual sender, except where the sender >> specifically attributes those views to Australia Post and is authorised >> to do so. >> >> Please consider the environment before printing this email. >> >> >> > > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster