Re: [Openais] Resource group refused to start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dan, 

First, I've subscribe on the news list. Thanks for the information ;)

So, I've apply your modifications and cluster works fine. Thanks a lot !

Regards,
Anthony

2011/12/23 Dan Frincu <df.cluster@xxxxxxxxx>
Hi,

On Fri, Dec 23, 2011 at 12:23 PM, Anthony BRODARD
<brodard.anthony@xxxxxxxxx> wrote:
> Hi list,
>
> I'm trying to configure corosync + DRBD on 2 servers, bart and lisa.
> DRBD works fine, no problem.
> But for corosync, I have a problem with resources' configuration. DRBD is
> correctly managed, I can move it on each server without any problem.
> But other resources (vip, apache, mysql and filesystem for drbd), which are
> included in a group, refuse to start, and are not displayed in the command
> "crm_mon" :
>
> ============
> Last updated: Fri Dec 23 11:10:54 2011
> Stack: openais
> Current DC: bart - partition with quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ bart lisa ]
>
>  Master/Slave Set: ms-drbd-rt
>      Masters: [ bart ]
>      Slaves: [ lisa ]
>
>
> First, the configuration :
>
> [bart]~ # crm configure show
> node bart
> node lisa
> primitive fs-data ocf:heartbeat:Filesystem \
>         params device="/dev/drbd/by-res/data" directory="/data/"
> fstype="ext3"

Needs monitor operation.

> primitive rt-apache2 ocf:heartbeat:apache \
>         params configfile="/etc/apache2/apache2.conf" port="443" \
>         op monitor interval="10" timeout="20s" depth="0" \
>         op stop interval="0" timeout="40" \
>         op start interval="0" timeout="60" \
>         meta is-managed="false"

Need to remove is-managed="false"

> primitive rt-drbd ocf:linbit:drbd \
>         params drbd_resource="data" \
>         op monitor interval="15s" ignore_deprecation="true" \
>         op stop interval="0" timeout="100" \
>         op start interval="0" timeout="240"

Need to specify 2 monitor operations, one for role=Master one for
role=Slave with different intervals, a bit higher in favor of the
Master.

> primitive rt-mysql lsb:mysql

I strongly recommend using an OCF RA, not the init script.
Specifically this one
https://github.com/fghaas/resource-agents/blob/master/heartbeat/mysql

> primitive rt-vip ocf:heartbeat:IPaddr2 \
>         params ip="10.1.150.150" cidr_netmask="32" \
>         op monitor interval="5s"
> group rt-grp rt-apache2 fs-data rt-mysql rt-vip \
>         meta target-role="Started"
> ms ms-drbd-rt rt-drbd \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> location prefer-rt-bart rt-grp 1: bart
> colocation rt-on-drbd inf: rt-grp ms-drbd-rt:Master
> order drbd-before-rt inf: ms-drbd-rt:promote rt-grp:start
> property $id="cib-bootstrap-options" \
>         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore" \
>         default-resource-stickiness="100" \
>         start-failure-is-fatal="false"

Remove start-failure-is-fatal, if you don't know what it does, don't enable it.

>
>
> When I try "crm resource restart rt-grp", syslog says:
>
> Dec 23 11:16:25 bart cibadmin: [4271]: info: Invoked: cibadmin -Ql -o
> resources
> Dec 23 11:16:25 bart cibadmin: [4273]: info: Invoked: cibadmin -p -R -o
> resources
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: - <cib
> admin_epoch="0" epoch="88" num_updates="1" >
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -
> <configuration >
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -
> <resources >
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -
> <group id="rt-grp" >
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -
>   <meta_attributes id="rt-grp-meta_attributes" >
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -
>     <nvpair value="Stopped" id="rt-grp-meta_attributes-target-role" />
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -
>   </meta_attributes>
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -
> </group>
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -
> </resources>
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: -
> </configuration>
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: - </cib>
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: + <cib
> admin_epoch="0" epoch="89" num_updates="1" >
> Dec 23 11:16:25 bart crmd: [3315]: info: abort_transition_graph:
> need_abort:59 - Triggered transition abort (complete=1) : Non-status change
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +
> <configuration >
> Dec 23 11:16:25 bart crmd: [3315]: info: need_abort: Aborting on change to
> admin_epoch
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +
> <resources >
> Dec 23 11:16:25 bart crmd: [3315]: info: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> origin=abort_transition_graph ]
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +
> <group id="rt-grp" >
> Dec 23 11:16:25 bart crmd: [3315]: info: do_state_transition: All 2 cluster
> nodes are eligible to run resources.
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +
>   <meta_attributes id="rt-grp-meta_attributes" >
> Dec 23 11:16:25 bart crmd: [3315]: info: do_pe_invoke: Query 116: Requesting
> the current CIB: S_POLICY_ENGINE
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +
>     <nvpair value="Started" id="rt-grp-meta_attributes-target-role" />
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +
>   </meta_attributes>
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +
> </group>
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +
> </resources>
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: +
> </configuration>
> Dec 23 11:16:25 bart cib: [3311]: info: log_data_element: cib:diff: + </cib>
> Dec 23 11:16:25 bart cib: [3311]: info: cib_process_request: Operation
> complete: op cib_replace for section resources (origin=local/cibadmin/2,
> version=0.89.1): ok (rc=0)
> Dec 23 11:16:26 bart crmd: [3315]: info: do_pe_invoke_callback: Invoking the
> PE: query=116, ref=pe_calc-dc-1324635386-87, seq=96, quorate=1
> Dec 23 11:16:26 bart pengine: [3314]: notice: unpack_config: On loss of CCM
> Quorum: Ignore
> Dec 23 11:16:26 bart pengine: [3314]: info: unpack_config: Node scores:
> 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> Dec 23 11:16:26 bart cib: [4274]: info: write_cib_contents: Archived
> previous version as /var/lib/heartbeat/crm/cib-63.raw
> Dec 23 11:16:26 bart pengine: [3314]: info: determine_online_status: Node
> bart is online
> Dec 23 11:16:26 bart pengine: [3314]: info: determine_online_status: Node
> lisa is online
> Dec 23 11:16:26 bart pengine: [3314]: notice: unpack_rsc_op: Operation
> rt-drbd:1_monitor_0 found resource rt-drbd:1 active on lisa
> Dec 23 11:16:26 bart pengine: [3314]: notice: group_print:  Resource Group:
> rt-grp
> Dec 23 11:16:26 bart pengine: [3314]: notice: native_print:      rt-apache2
>     (ocf::heartbeat:apache):        Stopped  (unmanaged)
> Dec 23 11:16:26 bart pengine: [3314]: notice: native_print:      fs-data
>    (ocf::heartbeat:Filesystem):    Stopped
> Dec 23 11:16:26 bart pengine: [3314]: notice: native_print:      rt-mysql
>     (lsb:mysql):    Stopped
> Dec 23 11:16:26 bart pengine: [3314]: notice: native_print:      rt-vip
> (ocf::heartbeat:IPaddr2):       Stopped
> Dec 23 11:16:26 bart pengine: [3314]: notice: clone_print:  Master/Slave
> Set: ms-drbd-rt
> Dec 23 11:16:26 bart pengine: [3314]: notice: short_print:      Masters: [
> bart ]
> Dec 23 11:16:26 bart cib: [4274]: info: write_cib_contents: Wrote version
> 0.89.0 of the CIB to disk (digest: cd534ad8c2b3c1f8add883e157966248)
> Dec 23 11:16:26 bart pengine: [3314]: notice: short_print:      Slaves: [
> lisa ]
> Dec 23 11:16:26 bart pengine: [3314]: info: master_color: Promoting
> rt-drbd:0 (Master bart)
> Dec 23 11:16:26 bart pengine: [3314]: info: master_color: ms-drbd-rt:
> Promoted 1 instances of a possible 1 to master
> Dec 23 11:16:26 bart pengine: [3314]: info: native_color: Unmanaged resource
> rt-apache2 allocated to 'nowhere': inactive
> Dec 23 11:16:26 bart pengine: [3314]: info: native_merge_weights: fs-data:
> Rolling back scores from rt-mysql
> Dec 23 11:16:26 bart cib: [4274]: info: retrieveCib: Reading cluster
> configuration from: /var/lib/heartbeat/crm/cib.f8vE7m (digest:
> /var/lib/heartbeat/crm/cib.JaAGaL)
> Dec 23 11:16:26 bart pengine: [3314]: info: native_color: Resource fs-data
> cannot run anywhere
> Dec 23 11:16:26 bart pengine: [3314]: info: native_merge_weights: rt-mysql:
> Rolling back scores from rt-vip
> Dec 23 11:16:26 bart pengine: [3314]: info: native_color: Resource rt-mysql
> cannot run anywhere
> Dec 23 11:16:26 bart pengine: [3314]: info: native_color: Resource rt-vip
> cannot run anywhere
> Dec 23 11:16:26 bart pengine: [3314]: info: master_color: Promoting
> rt-drbd:0 (Master bart)
> Dec 23 11:16:26 bart pengine: [3314]: info: master_color: ms-drbd-rt:
> Promoted 1 instances of a possible 1 to master
> Dec 23 11:16:26 bart pengine: [3314]: ERROR: create_notification_boundaries:
> Creating boundaries for ms-drbd-rt
> Dec 23 11:16:26 bart pengine: [3314]: ERROR: create_notification_boundaries:
> Creating boundaries for ms-drbd-rt
> Dec 23 11:16:26 bart pengine: [3314]: ERROR: create_notification_boundaries:
> Creating boundaries for ms-drbd-rt
> Dec 23 11:16:26 bart pengine: [3314]: ERROR: create_notification_boundaries:
> Creating boundaries for ms-drbd-rt
> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource
> rt-apache2     (Stopped unmanaged)
> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource
> fs-data        (Stopped)
> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource
> rt-mysql       (Stopped)
> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource
> rt-vip (Stopped)
> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource
> rt-drbd:0      (Master bart)
> Dec 23 11:16:26 bart pengine: [3314]: notice: LogActions: Leave resource
> rt-drbd:1      (Slave lisa)
> Dec 23 11:16:26 bart crmd: [3315]: info: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Dec 23 11:16:26 bart crmd: [3315]: info: unpack_graph: Unpacked transition
> 17: 0 actions in 0 synapses
> Dec 23 11:16:26 bart crmd: [3315]: info: do_te_invoke: Processing graph 17
> (ref=pe_calc-dc-1324635386-87) derived from
> /var/lib/pengine/pe-input-1084.bz2
> Dec 23 11:16:26 bart pengine: [3314]: info: process_pe_message: Transition
> 17: PEngine Input stored in: /var/lib/pengine/pe-input-1084.bz2
> Dec 23 11:16:26 bart crmd: [3315]: info: run_graph:
> ====================================================
> Dec 23 11:16:26 bart crmd: [3315]: notice: run_graph: Transition 17
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pengine/pe-input-1084.bz2): Complete
> Dec 23 11:16:26 bart crmd: [3315]: info: te_graph_trigger: Transition 17 is
> now complete
> Dec 23 11:16:26 bart crmd: [3315]: info: notify_crmd: Transition 17 status:
> done - <null>
> Dec 23 11:16:26 bart crmd: [3315]: info: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Dec 23 11:16:26 bart crmd: [3315]: info: do_state_transition: Starting
> PEngine Recheck Timer
>
>
> I don't understand what is wrong in my configuration. Did you have any idea?

Additional documentation available at:

http://www.clusterlabs.org/wiki/DRBD_MySQL_HowTo

http://www.drbd.org/users-guide-8.3/s-pacemaker-crm-drbd-backed-service.html

http://www.linbit.com/en/education/tech-guides/mysql-high-availability-on-the-pacemaker-cluster-stack/

http://www.hastexo.com/content/mysql-high-availability-sprint-launch-pacemaker

p.s.: the mailing list changed, this is the new one that should be used.

HTH,
Dan

>
> Regards
> Anthony
>
>
>
> _______________________________________________
> Openais mailing list
> Openais@xxxxxxxxxxxxxxxxxxxxxxxxxx
> https://lists.linuxfoundation.org/mailman/listinfo/openais



--
Dan Frincu
CCNA, RHCE

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux