Re: [Openais] Resource group refused to start

Anthony BRODARD <brodard.anthony@xxxxxxxxx> · Tue, 27 Dec 2011 12:28:10 +0100

Hi Dan, 
First, I've subscribe on the news list. Thanks for the information ;)

So, I've apply your modifications and cluster works fine. Thanks a lot !

Regards,
Anthony

2011/12/23 Dan Frincu <df.cluster@xxxxxxxxx>

Hi,

On Fri, Dec 23, 2011 at 12:23 PM, Anthony BRODARD

<brodard.anthony@xxxxxxxxx> wrote:

> Hi list,

>

> I'm trying to configure corosync + DRBD on 2 servers, bart and lisa.

> DRBD works fine, no problem.

> But for corosync, I have a problem with resources' configuration. DRBD is

> correctly managed, I can move it on each server without any problem.

> But other resources (vip, apache, mysql and filesystem for drbd), which are

> included in a group, refuse to start, and are not displayed in the command

> "crm_mon" :

>

> ============

> Last updated: Fri Dec 23 11:10:54 2011

> Stack: openais

> Current DC: bart - partition with quorum

> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b

> 2 Nodes configured, 2 expected votes

> 2 Resources configured.

> ============

>

> Online: [ bart lisa ]

>

>  Master/Slave Set: ms-drbd-rt

>      Masters: [ bart ]

>      Slaves: [ lisa ]

>

>

> First, the configuration :

>

> [bart]~ # crm configure show

> node bart

> node lisa

> primitive fs-data ocf:heartbeat:Filesystem \

>         params device="/dev/drbd/by-res/data" directory="/data/"

> fstype="ext3"

Needs monitor operation.

> primitive rt-apache2 ocf:heartbeat:apache \

>         params configfile="/etc/apache2/apache2.conf" port="443" \

>         op monitor interval="10" timeout="20s" depth="0" \

>         op stop interval="0" timeout="40" \

>         op start interval="0" timeout="60" \

>         meta is-managed="false"

Need to remove is-managed="false"

> primitive rt-drbd ocf:linbit:drbd \

>         params drbd_resource="data" \

>         op monitor interval="15s" ignore_deprecation="true" \

>         op stop interval="0" timeout="100" \

>         op start interval="0" timeout="240"

Need to specify 2 monitor operations, one for role=Master one for

role=Slave with different intervals, a bit higher in favor of the

Master.

> primitive rt-mysql lsb:mysql

I strongly recommend using an OCF RA, not the init script.

Specifically this one

https://github.com/fghaas/resource-agents/blob/master/heartbeat/mysql

> primitive rt-vip ocf:heartbeat:IPaddr2 \

>         params ip="10.1.150.150" cidr_netmask="32" \

>         op monitor interval="5s"

> group rt-grp rt-apache2 fs-data rt-mysql rt-vip \

>         meta target-role="Started"

> ms ms-drbd-rt rt-drbd \

>         meta master-max="1" master-node-max="1" clone-max="2"

> clone-node-max="1" notify="true"

> location prefer-rt-bart rt-grp 1: bart

> colocation rt-on-drbd inf: rt-grp ms-drbd-rt:Master

> order drbd-before-rt inf: ms-drbd-rt:promote rt-grp:start

> property $id="cib-bootstrap-options" \

>         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \

>         cluster-infrastructure="openais" \

>         expected-quorum-votes="2" \

>         stonith-enabled="false" \

>         no-quorum-policy="ignore" \

>         default-resource-stickiness="100" \

>         start-failure-is-fatal="false"

Remove start-failure-is-fatal, if you don't know what it does, don't enable it.

>

>

> When I try "crm resource restart rt-grp", syslog says:

>

> Dec 23 11:16:25 bart cibadmin: [4271]: info: Invoked: cibadmin -Ql -o

> resources

> Dec 23 11:16:25 bart cibadmin: [4273]: info: Invoked: cibadmin -p -R -o

> resources

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: - <cib

> admin_epoch="0" epoch="88" num_updates="1" >

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -

> <configuration >

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -

> <resources >

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -

> <group id="rt-grp" >

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -

>   <meta_attributes id="rt-grp-meta_attributes" >

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -

>     <nvpair value="Stopped" id="rt-grp-meta_attributes-target-role" />

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -

>   </meta_attributes>

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -

> </group>

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -

> </resources>

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -

> </configuration>

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: - </cib>

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: + <cib

> admin_epoch="0" epoch="89" num_updates="1" >

> Dec 23 11:16:25 bart crmd: [3315]: info: abort_transition_graph:

> need_abort:59 - Triggered transition abort (complete=1) : Non-status change

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +

> <configuration >

> Dec 23 11:16:25 bart crmd: [3315]: info: need_abort: Aborting on change to

> admin_epoch

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +

> <resources >

> Dec 23 11:16:25 bart crmd: [3315]: info: do_state_transition: State

> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL

> origin=abort_transition_graph ]

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +

> <group id="rt-grp" >

> Dec 23 11:16:25 bart crmd: [3315]: info: do_state_transition: All 2 cluster

> nodes are eligible to run resources.

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +

>   <meta_attributes id="rt-grp-meta_attributes" >

> Dec 23 11:16:25 bart crmd: [3315]: info: do_pe_invoke: Query 116: Requesting

> the current CIB: S_POLICY_ENGINE

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +

>     <nvpair value="Started" id="rt-grp-meta_attributes-target-role" />

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +

>   </meta_attributes>

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +

> </group>

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +

> </resources>

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +

> </configuration>

> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: + </cib>

> Dec 23 11:16:25 bart cib: [3311]: info: cib_process_request: Operation

> complete: op cib_replace for section resources (origin=local/cibadmin/2,

> version=0.89.1): ok (rc=0)

> Dec 23 11:16:26 bart crmd: [3315]: info: do_pe_invoke_callback: Invoking the

> PE: query=116, ref=pe_calc-dc-1324635386-87, seq=96, quorate=1

> Dec 23 11:16:26 bart pengine: [3314]: notice: unpack_config: On loss of CCM

> Quorum: Ignore

> Dec 23 11:16:26 bart pengine: [3314]: info: unpack_config: Node scores:

> 'red' = -INFINITY, 'yellow' = 0, 'green' = 0

> Dec 23 11:16:26 bart cib: [4274]: info: write_cib_contents: Archived

> previous version as /var/lib/heartbeat/crm/cib-63.raw

> Dec 23 11:16:26 bart pengine: [3314]: info: determine_online_status: Node

> bart is online

> Dec 23 11:16:26 bart pengine: [3314]: info: determine_online_status: Node

> lisa is online

> Dec 23 11:16:26 bart pengine: [3314]: notice: unpack_rsc_op: Operation

> rt-drbd:1_monitor_0 found resource rt-drbd:1 active on lisa

> Dec 23 11:16:26 bart pengine: [3314]: notice: group_print:  Resource Group:

> rt-grp

> Dec 23 11:16:26 bart pengine: [3314]: notice: native_print:      rt-apache2

>     (ocf::heartbeat:apache):        Stopped  (unmanaged)

> Dec 23 11:16:26 bart pengine: [3314]: notice: native_print:      fs-data

>    (ocf::heartbeat:Filesystem):    Stopped

> Dec 23 11:16:26 bart pengine: [3314]: notice: native_print:      rt-mysql

>     (lsb:mysql):    Stopped

> Dec 23 11:16:26 bart pengine: [3314]: notice: native_print:      rt-vip

> (ocf::heartbeat:IPaddr2):       Stopped

> Dec 23 11:16:26 bart pengine: [3314]: notice: clone_print:  Master/Slave

> Set: ms-drbd-rt

> Dec 23 11:16:26 bart pengine: [3314]: notice: short_print:      Masters: [

> bart ]

> Dec 23 11:16:26 bart cib: [4274]: info: write_cib_contents: Wrote version

> 0.89.0 of the CIB to disk (digest: cd534ad8c2b3c1f8add883e157966248)

> Dec 23 11:16:26 bart pengine: [3314]: notice: short_print:      Slaves: [

> lisa ]

> Dec 23 11:16:26 bart pengine: [3314]: info: master_color: Promoting

> rt-drbd:0 (Master bart)

> Dec 23 11:16:26 bart pengine: [3314]: info: master_color: ms-drbd-rt:

> Promoted 1 instances of a possible 1 to master

> Dec 23 11:16:26 bart pengine: [3314]: info: native_color: Unmanaged resource

> rt-apache2 allocated to 'nowhere': inactive

> Dec 23 11:16:26 bart pengine: [3314]: info: native_merge_weights: fs-data:

> Rolling back scores from rt-mysql

> Dec 23 11:16:26 bart cib: [4274]: info: retrieveCib: Reading cluster

> configuration from: /var/lib/heartbeat/crm/cib.f8vE7m (digest:

> /var/lib/heartbeat/crm/cib.JaAGaL)

> Dec 23 11:16:26 bart pengine: [3314]: info: native_color: Resource fs-data

> cannot run anywhere

> Dec 23 11:16:26 bart pengine: [3314]: info: native_merge_weights: rt-mysql:

> Rolling back scores from rt-vip

> Dec 23 11:16:26 bart pengine: [3314]: info: native_color: Resource rt-mysql

> cannot run anywhere

> Dec 23 11:16:26 bart pengine: [3314]: info: native_color: Resource rt-vip

> cannot run anywhere

> Dec 23 11:16:26 bart pengine: [3314]: info: master_color: Promoting

> rt-drbd:0 (Master bart)

> Dec 23 11:16:26 bart pengine: [3314]: info: master_color: ms-drbd-rt:

> Promoted 1 instances of a possible 1 to master

> Dec 23 11:16:26 bart pengine: [3314]: ERROR: create_notification_boundaries:

> Creating boundaries for ms-drbd-rt

> Dec 23 11:16:26 bart pengine: [3314]: ERROR: create_notification_boundaries:

> Creating boundaries for ms-drbd-rt

> Dec 23 11:16:26 bart pengine: [3314]: ERROR: create_notification_boundaries:

> Creating boundaries for ms-drbd-rt

> Dec 23 11:16:26 bart pengine: [3314]: ERROR: create_notification_boundaries:

> Creating boundaries for ms-drbd-rt

> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource

> rt-apache2     (Stopped unmanaged)

> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource

> fs-data        (Stopped)

> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource

> rt-mysql       (Stopped)

> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource

> rt-vip (Stopped)

> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource

> rt-drbd:0      (Master bart)

> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource

> rt-drbd:1      (Slave lisa)

> Dec 23 11:16:26 bart crmd: [3315]: info: do_state_transition: State

> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS

> cause=C_IPC_MESSAGE origin=handle_response ]

> Dec 23 11:16:26 bart crmd: [3315]: info: unpack_graph: Unpacked transition

> 17: 0 actions in 0 synapses

> Dec 23 11:16:26 bart crmd: [3315]: info: do_te_invoke: Processing graph 17

> (ref=pe_calc-dc-1324635386-87) derived from

> /var/lib/pengine/pe-input-1084.bz2

> Dec 23 11:16:26 bart pengine: [3314]: info: process_pe_message: Transition

> 17: PEngine Input stored in: /var/lib/pengine/pe-input-1084.bz2

> Dec 23 11:16:26 bart crmd: [3315]: info: run_graph:

> ====================================================

> Dec 23 11:16:26 bart crmd: [3315]: notice: run_graph: Transition 17

> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,

> Source=/var/lib/pengine/pe-input-1084.bz2): Complete

> Dec 23 11:16:26 bart crmd: [3315]: info: te_graph_trigger: Transition 17 is

> now complete

> Dec 23 11:16:26 bart crmd: [3315]: info: notify_crmd: Transition 17 status:

> done - <null>

> Dec 23 11:16:26 bart crmd: [3315]: info: do_state_transition: State

> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS

> cause=C_FSA_INTERNAL origin=notify_crmd ]

> Dec 23 11:16:26 bart crmd: [3315]: info: do_state_transition: Starting

> PEngine Recheck Timer

>

>

> I don't understand what is wrong in my configuration. Did you have any idea?

Additional documentation available at:

http://www.clusterlabs.org/wiki/DRBD_MySQL_HowTo

http://www.drbd.org/users-guide-8.3/s-pacemaker-crm-drbd-backed-service.html

http://www.linbit.com/en/education/tech-guides/mysql-high-availability-on-the-pacemaker-cluster-stack/

http://www.hastexo.com/content/mysql-high-availability-sprint-launch-pacemaker

p.s.: the mailing list changed, this is the new one that should be used.

HTH,

Dan

>

> Regards

> Anthony

>

>

>

> _______________________________________________

> Openais mailing list

> Openais@xxxxxxxxxxxxxxxxxxxxxxxxxx

> https://lists.linuxfoundation.org/mailman/listinfo/openais

--

Dan Frincu

CCNA, RHCE

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss