RE: 3 node cluster crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

I have a problem with rgmanager’s script resource.

My script uses $OCF_RESKEY_service_name in a following way:

 

<script file="/usr/local/sbin/cl2r.sh" name="script-VG" service_name="VOLUME GROUP">

 

It works on volume group: VOLUME GROUP defined in service_name.

 

If I have multiple services defined using the same script, I got:

clurgmgrd[10143]: <err> Unique attribute collision. type=script attr=file value=/usr/local/sbin/cl2r.sh

 

Checking /usr/share/cluster/script.sh I found:

 

        <parameter name="file" unique="1" required="1">

            <longdesc lang="en">

                Path to script

            </longdesc>

            <shortdesc lang="en">

                Path to script

            </shortdesc>

            <content type="string"/>

        </parameter>

 

Checking latest: (line 40)

 

http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=blob;f=rgmanager/src/resources/script.sh;h=41298115ccd39863f9f45d5f889e3b6299b3659d;hb=refs/heads/STABLE2#l40

 

Do you know why this file parameter for script resource has been set to unique?

May I ask to change it to unique=”0”?

 

Best regards,

Norbert Németh

 

From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Dalton, Maurice
Sent: Tuesday, August 05, 2008 11:56 PM
To: linux-cluster@xxxxxxxxxx
Subject: 3 node cluster crashes

 

 

I have a 3 node cluster running cman-2.0.84-2.el5.  At times we have spanning tree events that cause network storms up to 9 seconds.

When these events  occur (today we caused them twice to verify this issue). All three nodes go down within seconds of this event.

 

The second time we tried it I added the totem token statement shown below. Same problem.

 

 

 

 

 

<cman>

                <multicast addr="225.0.0.11"/>

                <totem token="21000"/>

        </cman>

 

 

 

Aug  5 16:41:18 csarcsys2-eth0 ntpd[3484]: kernel time sync enabled 0001

Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] The token was lost in the OPERATIONAL state.

Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).

Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).

Aug  5 16:41:19 csarcsys2-eth0 openais[3096]: [TOTEM] entering GATHER state from 2.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering GATHER state from 0.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Creating commit token because I am the rep.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Saving state aru 46 high seq received 46

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Storing new sequence id for ring b50

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering COMMIT state.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering RECOVERY state.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] position [0] member 172.xx.xx.xxx:

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] previous ring seq 2892 rep 172.xx.xxx.xx

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] aru 46 high delivered 46 received flag 1

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Did not need to originate any messages in recovery.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] Sending initial ORF token

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] CLM CONFIGURATION CHANGE

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] New Configuration:

Aug  5 16:41:24 csarcsys2-eth0 kernel: dlm: closing connection to node 1

Aug  5 16:41:24 csarcsys2-eth0 clurgmgrd[3750]: <emerg> #1: Quorum Dissolved

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)

Aug  5 16:41:24 csarcsys2-eth0 kernel: dlm: closing connection to node 3

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Left:

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Joined:

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CMAN ] quorum lost, blocking activity

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] CLM CONFIGURATION CHANGE

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] New Configuration:

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ]   r(0) ip(172. xx.xxx.xx)

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Cluster is not quorate.  Refusing connection.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Left:

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing connect: Connection refused

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] Members Joined:

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Invalid descriptor specified (-111).

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [SYNC ] This node is within the primary component and will provide service.

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Someone may be attempting something evil.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [TOTEM] entering OPERATIONAL state.

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing get: Invalid request descriptor

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CLM  ] got nodejoin message 172.24.86.143

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Cluster is not quorate.  Refusing connection.

Aug  5 16:41:24 csarcsys2-eth0 openais[3096]: [CPG  ] got joinlist message from node 2

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Error while processing connect: Connection refused

Aug  5 16:41:24 csarcsys2-eth0 ccsd[3031]: Invalid descriptor specified (-111).



NOTICE: If received in error, please destroy and notify sender. Sender does not intend to waive confidentiality or privilege. Use of this email is prohibited when received in error.

Local registered entity: MSCI KFT
Metropolitan Court acting as the Court of Registry
Registered office: 1138 Budapest, Népfürdő utca 22, Hungary
Registration No. 01-09-885383
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux