Re: on exiting maintenance mode

Ferenc Wagner <wferi@xxxxxxx> · Tue, 26 Aug 2014 19:40:19 +0200

Andrew Beekhof <andrew@xxxxxxxxxxx> writes:

> On 22 Aug 2014, at 10:37 am, Ferenc Wagner <wferi@xxxxxxx> wrote:
>
>> While my Pacemaker cluster was in maintenance mode, resources were moved
>> (by hand) between the nodes as I rebooted each node in turn.  In the end
>> the crm status output became perfectly empty, as the reboot of a given
>> node removed from the output the resources which were located on the
>> rebooted node at the time of entering maintenance mode.  I expected full
>> resource discovery on exiting maintenance mode,
>
> Version and logs?

(The more interesting part comes later, please skip to the theoretical
part if you're short on time. :)

I left those out, as I don't expect the actual behavior to be a bug.
But I experienced this with Pacemaker version 1.1.7.  I know it's old
and it suffers from crmd segfault on entering maintenance mode (cf.
http://thread.gmane.org/gmane.linux.highavailability.user/39121), but
works well generally so I did not get to upgrade it yet.  Now that I
mentioned the crmd segfault: I noted that it died on the DC when I
entered maintenance mode:

crmd: [7452]: info: te_rsc_command: Initiating action 64: cancel vm-tmvp_monitor_60000 on n01 (local)
crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
crmd: [7452]: ERROR: get_lrm_resource: Could not add resource vm-tmvp to LRM
crmd: [7452]: ERROR: do_lrm_invoke: Invalid resource definition
crmd: [7452]: WARN: do_lrm_invoke: bad input <create_request_adv origin="te_rsc_command" t="crmd" version="3.0.6" subt="request" reference="lrm_invoke-tengine-1408517719-30820" crm_task="lrm_invoke" crm_sys_to="lrmd" crm_sys_from="tengine" crm_host_to="n01" >
crmd: [7452]: WARN: do_lrm_invoke: bad input   <crm_xml >
crmd: [7452]: WARN: do_lrm_invoke: bad input     <rsc_op id="64" operation="cancel" operation_key="vm-tmvp_monitor_60000" on_node="n01" on_node_uuid="n01" transition-key="64:20579:0:1b0a6e79-af5a-41e4-8ced-299371e7922c" >
crmd: [7452]: WARN: do_lrm_invoke: bad input       <primitive id="vm-tmvp" long-id="vm-tmvp" class="ocf" provider="niif" type="TransientDomain" />
crmd: [7452]: info: te_rsc_command: Initiating action 86: cancel vm-wfweb_monitor_60000 on n01 (local)
crmd: [7452]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel.
crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
corosync[6966]:   [pcmk  ] info: pcmk_ipc_exit: Client crmd (conn=0x1dc6ea0, async-conn=0x1dc6ea0) left
pacemakerd: [7443]: WARN: Managed crmd process 7452 killed by signal 11 [SIGSEGV - Segmentation violation].
pacemakerd: [7443]: notice: pcmk_child_exit: Child process crmd terminated with signal 11 (pid=7452, rc=0)

However, it got restarted seamlessly, without the node being fenced, so
I did not even notice this until now.  Should this have resulted in the
node being fenced?

But back to the issue at hand.  The Pacemaker shutdown seemed normal,
apart from the bunch of messages like:

crmd: [13794]: ERROR: verify_stopped: Resource vm-web5 was active at shutdown.  You may ignore this error if it is unmanaged.

appearing twice and warnings like:

cib: [7447]: WARN: send_ipc_message: IPC Channel to 13794 is not connected
cib: [7447]: WARN: send_via_callback_channel: Delivery of reply to client 13794/bf6f43a2-70db-40ac-a902-eabc3c12e20d failed
cib: [7447]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
corosync[6966]:   [pcmk  ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)

On reboot, corosync complained until the some Pacemaker components
started:

corosync[8461]:   [pcmk  ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2)
corosync[8461]:   [pcmk  ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)

Pacemaker then probed the resources on the local node (all was inactive):

lrmd: [8946]: info: rsc:stonith-n01 probe[5] (pid 9081)
lrmd: [8946]: info: rsc:dlm:0 probe[6] (pid 9082)
[...]
lrmd: [8946]: info: operation monitor[112] on vm-fir for client 8949: pid 12015 exited with return code 7
crmd: [8949]: info: process_lrm_event: LRM operation vm-fir_monitor_0 (call=112, rc=7, cib-update=130, confirmed=true) not running
attrd: [8947]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
attrd: [8947]: notice: attrd_perform_update: Sent update 4: probe_complete=true

Then I cleaned up some resources running on other nodes, which resulted
in those showing up in the crm status output providing log lines like eg.:

crmd: [8949]: WARN: status_from_rc: Action 4 (vm-web5_monitor_0) on n02 failed (target: 7 vs. rc: 0): Error

Finally, I exited maintenance mode, and Pacemaker started every resource
I did not clean up beforehand, concurrently with their already running
instances:

pengine: [8948]: notice: LogActions: Start   vm-web9#011(n03)

I can provide more logs if this behavior is indeed unexpected, but it
looks more like I miss the exact concept of maintenance mode.

> The discovery usually happens at the point the cluster is started on a node.

A local discovery did happen, but it could not find anything, as the
cluster was started by the init scripts, well before any resource could
have been moved to the freshly rebooted node (manually, to free the next
node for rebooting).

> Maintenance mode just prevents the cluster from doing anything about it.

Fine.  So I should have restarted Pacemaker on each node before leaving
maintenance mode, right?  Or is there a better way?  (Unfortunately, I
could not manage the rolling reboot through Pacemaker, as some DLM/cLVM
freeze made the cluster inoperable in its normal way.)

>> but it probably did not happen, as the cluster started up resources
>> already running on other nodes, which is generally forbidden.  Given
>> that all resources were running (though possibly migrated during the
>> maintenance), what would have been the correct way of bringing the
>> cluster out of maintenance mode?  This should have required no
>> resource actions at all.  Would cleanup of all resources have helped?
>> Or is there a better way?

You say in the above thread that resource definitions can be changed:
http://thread.gmane.org/gmane.linux.highavailability.user/39121/focus=39437
Let me quote from there (starting with the words of Ulrich Windl):

>>>> I think it's a common misconception that you can modify cluster
>>>> resources while in maintenance mode:
>>> 
>>> No, you _should_ be able to.  If that's not the case, its a bug.
>> 
>> So the end of maintenance mode starts with a "re-probe"?
>
> No, but it doesn't need to.  
> The policy engine already knows if the resource definitions changed
> and the recurring monitor ops will find out if any are not running.

My experiences show that you may not *move around* resources while in
maintenance mode.  That would indeed require a cluster-wide re-probe,
which does not seem to happen (unless forced some way).  Probably there
was some misunderstanding in the above discussion, I guess Ulrich meant
moving resources when he wrote "modifying cluster resources".  Does this
make sense?
-- 
Thanks,
Feri.

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster