Re: on exiting maintenance mode

Ferenc Wagner <wferi@xxxxxxx> · Wed, 27 Aug 2014 19:09:45 +0200

Andrew Beekhof <andrew@xxxxxxxxxxx> writes:

> On 27 Aug 2014, at 3:40 am, Ferenc Wagner <wferi@xxxxxxx> wrote:
>
>> Andrew Beekhof <andrew@xxxxxxxxxxx> writes:
>> 
>>> On 22 Aug 2014, at 10:37 am, Ferenc Wagner <wferi@xxxxxxx> wrote:
>>> 
>>>> While my Pacemaker cluster was in maintenance mode, resources were moved
>>>> (by hand) between the nodes as I rebooted each node in turn.  In the end
>>>> the crm status output became perfectly empty, as the reboot of a given
>>>> node removed from the output the resources which were located on the
>>>> rebooted node at the time of entering maintenance mode.  I expected full
>>>> resource discovery on exiting maintenance mode,
>>
>> I experienced this with Pacemaker version 1.1.7.  I know it's old
>> and it suffers from crmd segfault on entering maintenance mode (cf.
>> http://thread.gmane.org/gmane.linux.highavailability.user/39121), but
>> works well generally so I did not get to upgrade it yet.  Now that I
>> mentioned the crmd segfault: I noted that it died on the DC when I
>> entered maintenance mode:
>> 
>> crmd: [7452]: info: te_rsc_command: Initiating action 64: cancel vm-tmvp_monitor_60000 on n01 (local)
>> crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
>
> That looks like the lrmd died.

It did not die, at least not fully.  After entering maintenance mode
crmd asked lrmd to cancel the recurring monitor ops for all resources:

08:40:18 crmd: [7452]: info: do_te_invoke: Processing graph 20578 (ref=pe_calc-dc-1408516818-30681) derived from /var/lib/pengine/pe-input-848.bz2
08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 17: cancel dlm:0_monitor_120000 on n04
08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 84: cancel dlm:0_cancel_120000 on n01 (local)
08:40:18 lrmd: [7449]: info: cancel_op: operation monitor[194] on dlm:0 for client 7452, its parameters: [...] cancelled
08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 50: cancel dlm:2_monitor_120000 on n02

The stream of monitor op cancellation messages ended with:

08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 71: cancel vm-mdssq_monitor_60000 on n01 (local)
08:40:18 lrmd: [7449]: info: cancel_op: operation monitor[329] on vm-mdssq for client 7452, its parameters: [...] cancelled
08:40:18 crmd: [7452]: info: process_lrm_event: LRM operation vm-mdssq_monitor_60000 (call=329, status=1, cib-update=0, confirmed=true) Cancelled
08:40:18 crmd: [7452]: notice: run_graph: ==== Transition 20578 (Complete=87, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-848.bz2): Complete
08:40:18 crmd: [7452]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
08:40:18 pengine: [7451]: notice: process_pe_message: Transition 20578: PEngine Input stored in: /var/lib/pengine/pe-input-848.bz2
08:41:28 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=10000, abort_level=0, complete=true)
08:41:28 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition
[these two lines repeated several times]
08:41:28 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=10000, abort_level=0, complete=true)
08:41:28 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition
08:41:38 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=true)
08:41:38 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition
08:48:05 cib: [7447]: info: cib_stats: Processed 159 operations (23207.00us average, 0% utilization) in the last 10min
08:55:18 crmd: [7452]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (900000ms)
08:55:18 crmd: [7452]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ]
08:55:18 crmd: [7452]: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
08:55:19 pengine: [7451]: notice: stage6: Delaying fencing operations until there are resources to manage
08:55:19 crmd: [7452]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
08:55:19 crmd: [7452]: info: do_te_invoke: Processing graph 20579 (ref=pe_calc-dc-1408517718-30802) derived from /var/lib/pengine/pe-input-849.bz2
08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 17: cancel dlm:0_monitor_120000 on n04
08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 84: cancel dlm:0_cancel_120000 on n01 (local)
08:55:19 crmd: [7452]: info: cancel_op: No pending op found for dlm:0:194
08:55:19 lrmd: [7449]: info: on_msg_cancel_op: no operation with id 194

Interestingly, monitor[194], lastly mentioned by lrmd, was the very
first cancelled operation.

08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 50: cancel dlm:2_monitor_120000 on n02
08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 83: cancel vm-cedar_monitor_60000 on n01 (local)
08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(673): failed to receive a reply message of getrsc.
08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
08:55:19 crmd: [7452]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel.
08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
08:55:19 crmd: [7452]: ERROR: get_lrm_resource: Could not add resource vm-cedar to LRM
08:55:19 crmd: [7452]: ERROR: do_lrm_invoke: Invalid resource definition
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <create_request_adv origin="te_rsc_command" t="crmd" version="3.0.6" subt="request" reference="lrm_invoke-tengine-1408517719-30807" crm_task="lrm_invoke" crm_sys_to="lrmd" crm_sys_from="tengine" crm_host_to="n01" >
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input   <crm_xml >
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input     <rsc_op id="83" operation="cancel" operation_key="vm-cedar_monitor_60000" on_node="n01" on_node_uuid="n01" transition-key="83:20579:0:1b0a6e79-af5a-41e4-8ced-299371e7922c" >
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input       <primitive id="vm-cedar" long-id="vm-cedar" class="ocf" provider="niif" type="TransientDomain" />
08:55:19 crmd: [7452]: ERROR: log_data_element: Output truncated: available=727, needed=1374
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input       <attributes CRM_meta_call_id="195" [really very long]
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input     </rsc_op>
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input   </crm_xml>
08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input </create_request_adv>

Blocks of messages like the above repeat a couple of times for other
resources, then crmd kicks the bucket and gets restarted:

08:55:19 corosync[6966]:   [pcmk  ] info: pcmk_ipc_exit: Client crmd (conn=0x1dc6ea0, async-conn=0x1dc6ea0) left
08:55:19 pacemakerd: [7443]: WARN: Managed crmd process 7452 killed by signal 11 [SIGSEGV - Segmentation violation].
08:55:19 pacemakerd: [7443]: notice: pcmk_child_exit: Child process crmd terminated with signal 11 (pid=7452, rc=0)
08:55:19 pacemakerd: [7443]: notice: pcmk_child_exit: Respawning failed child process: crmd
08:55:19 pacemakerd: [7443]: info: start_child: Forked child 13794 for process crmd
08:55:19 corosync[6966]:   [pcmk  ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
08:55:19 crmd: [13794]: info: Invoked: /usr/lib/pacemaker/crmd 

Anyway, no further logs from lrmd after this point until hours later I
rebooted the machine:

14:37:06 pacemakerd: [7443]: notice: stop_child: Stopping lrmd: Sent -15 to process 7449
14:37:06 lrmd: [7449]: info: lrmd is shutting down
14:37:06 pacemakerd: [7443]: info: pcmk_child_exit: Child process lrmd exited (pid=7449, rc=0)

So lrmd was alive all the time.

> Which created a condition in the crmd that it couldn't handle so it
> crashed too.

Maybe their connection got severed somehow.

>> However, it got restarted seamlessly, without the node being fenced, so
>> I did not even notice this until now.  Should this have resulted in the
>> node being fenced?
>
> Depends how fast the node can respawn.

You mean how fast crmd can respawn?  How much time does it have to
respawn to avoid being fenced?

>> crmd: [13794]: ERROR: verify_stopped: Resource vm-web5 was active at shutdown.  You may ignore this error if it is unmanaged.
>
> In maintenance mode, everything is unmanaged. So that would be expected.

Is maintenance mode the same as unmanaging all resources?  I think the
latter does not cancel the monitor operations here...

>>> The discovery usually happens at the point the cluster is started on
>>> a node.
>> 
>> A local discovery did happen, but it could not find anything, as the
>> cluster was started by the init scripts, well before any resource could
>> have been moved to the freshly rebooted node (manually, to free the next
>> node for rebooting).
>
> Thats your problem then, you've started resources outside of the
> control of the cluster.

Some of them, yes, and moved the rest between the nodes.  All this
circumventing the cluster.

> Two options... recurring monitor actions with role=Stopped would have
> caught this

Even in maintenance mode?  Wouldn't they have been cancelled just like
the ordinary recurring monitor actions?

I guess adding them would run a recurring monitor operation for every
resource on every node, only with different expectations, right?

> or you can run crm_resource --cleanup after you've moved resources around.

I actually ran some crm resource cleanups for a couple of resources, and
those really were not started on exiting maintenance mode.

>>> Maintenance mode just prevents the cluster from doing anything about it.
>> 
>> Fine.  So I should have restarted Pacemaker on each node before leaving
>> maintenance mode, right?  Or is there a better way?
>
> See above

So crm_resource -r whatever -C is the way, for each resource separately.
Is there no way to do this for all resources at once?

>> You say in the above thread that resource definitions can be changed:
>> http://thread.gmane.org/gmane.linux.highavailability.user/39121/focus=39437
>> Let me quote from there (starting with the words of Ulrich Windl):
>> 
>>>>>> I think it's a common misconception that you can modify cluster
>>>>>> resources while in maintenance mode:
>>>>> 
>>>>> No, you _should_ be able to.  If that's not the case, its a bug.
>>>> 
>>>> So the end of maintenance mode starts with a "re-probe"?
>>> 
>>> No, but it doesn't need to.  
>>> The policy engine already knows if the resource definitions changed
>>> and the recurring monitor ops will find out if any are not running.
>> 
>> My experiences show that you may not *move around* resources while in
>> maintenance mode.
>
> Correct
>
>> That would indeed require a cluster-wide re-probe, which does not
>> seem to happen (unless forced some way).  Probably there was some
>> misunderstanding in the above discussion, I guess Ulrich meant moving
>> resources when he wrote "modifying cluster resources".  Does this
>> make sense?
>
> No, I've reasonably sure he meant changing their definitions in the cib.
> Or at least thats what I thought he meant at the time.

Nobody could blame you for that, because that's what it means.  But then
he inquired about a "re-probe", which fits more the problem of changing
the status of resources, not their definition.  Actually, I was so
firmly stuck in this mind set, that first I wanted to ask you to
reconsider, your response felt so much out of place.  That's all about
history for now...

After all this, I suggest to clarify this issue in the fine manual.
I've read it a couple of times, and still got the wrong impression.
-- 
Regards,
Feri.

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster