Re: on exiting maintenance mode

Andrew Beekhof <andrew@xxxxxxxxxxx> · Wed, 27 Aug 2014 14:54:33 +1000

On 27 Aug 2014, at 3:40 am, Ferenc Wagner <wferi@xxxxxxx> wrote:

> Andrew Beekhof <andrew@xxxxxxxxxxx> writes:
> 
>> On 22 Aug 2014, at 10:37 am, Ferenc Wagner <wferi@xxxxxxx> wrote:
>> 
>>> While my Pacemaker cluster was in maintenance mode, resources were moved
>>> (by hand) between the nodes as I rebooted each node in turn.  In the end
>>> the crm status output became perfectly empty, as the reboot of a given
>>> node removed from the output the resources which were located on the
>>> rebooted node at the time of entering maintenance mode.  I expected full
>>> resource discovery on exiting maintenance mode,
>> 
>> Version and logs?
> 
> (The more interesting part comes later, please skip to the theoretical
> part if you're short on time. :)
> 
> I left those out, as I don't expect the actual behavior to be a bug.
> But I experienced this with Pacemaker version 1.1.7.  I know it's old

No kidding :)

> and it suffers from crmd segfault on entering maintenance mode (cf.
> http://thread.gmane.org/gmane.linux.highavailability.user/39121), but
> works well generally so I did not get to upgrade it yet.  Now that I
> mentioned the crmd segfault: I noted that it died on the DC when I
> entered maintenance mode:
> 
> crmd: [7452]: info: te_rsc_command: Initiating action 64: cancel vm-tmvp_monitor_60000 on n01 (local)
> crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.

That looks like the lrmd died.

> crmd: [7452]: ERROR: get_lrm_resource: Could not add resource vm-tmvp to LRM
> crmd: [7452]: ERROR: do_lrm_invoke: Invalid resource definition
> crmd: [7452]: WARN: do_lrm_invoke: bad input <create_request_adv origin="te_rsc_command" t="crmd" version="3.0.6" subt="request" reference="lrm_invoke-tengine-1408517719-30820" crm_task="lrm_invoke" crm_sys_to="lrmd" crm_sys_from="tengine" crm_host_to="n01" >
> crmd: [7452]: WARN: do_lrm_invoke: bad input   <crm_xml >
> crmd: [7452]: WARN: do_lrm_invoke: bad input     <rsc_op id="64" operation="cancel" operation_key="vm-tmvp_monitor_60000" on_node="n01" on_node_uuid="n01" transition-key="64:20579:0:1b0a6e79-af5a-41e4-8ced-299371e7922c" >
> crmd: [7452]: WARN: do_lrm_invoke: bad input       <primitive id="vm-tmvp" long-id="vm-tmvp" class="ocf" provider="niif" type="TransientDomain" />
> crmd: [7452]: info: te_rsc_command: Initiating action 86: cancel vm-wfweb_monitor_60000 on n01 (local)
> crmd: [7452]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel.
> crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
> corosync[6966]:   [pcmk  ] info: pcmk_ipc_exit: Client crmd (conn=0x1dc6ea0, async-conn=0x1dc6ea0) left
> pacemakerd: [7443]: WARN: Managed crmd process 7452 killed by signal 11 [SIGSEGV - Segmentation violation].

Which created a condition in the crmd that it couldn't handle so it crashed too.

> pacemakerd: [7443]: notice: pcmk_child_exit: Child process crmd terminated with signal 11 (pid=7452, rc=0)
> 
> However, it got restarted seamlessly, without the node being fenced, so
> I did not even notice this until now.  Should this have resulted in the
> node being fenced?

Depends how fast the node can respawn.

> 
> But back to the issue at hand.  The Pacemaker shutdown seemed normal,
> apart from the bunch of messages like:
> 
> crmd: [13794]: ERROR: verify_stopped: Resource vm-web5 was active at shutdown.  You may ignore this error if it is unmanaged.

In maintenance mode, everything is unmanaged. So that would be expected.

> 
> appearing twice and warnings like:
> 
> cib: [7447]: WARN: send_ipc_message: IPC Channel to 13794 is not connected
> cib: [7447]: WARN: send_via_callback_channel: Delivery of reply to client 13794/bf6f43a2-70db-40ac-a902-eabc3c12e20d failed
> cib: [7447]: WARN: do_local_notify: A-Sync reply to crmd failed: reply failed
> corosync[6966]:   [pcmk  ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> 
> On reboot, corosync complained until the some Pacemaker components
> started:
> 
> corosync[8461]:   [pcmk  ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2)
> corosync[8461]:   [pcmk  ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)
> 
> Pacemaker then probed the resources on the local node (all was inactive):
> 
> lrmd: [8946]: info: rsc:stonith-n01 probe[5] (pid 9081)
> lrmd: [8946]: info: rsc:dlm:0 probe[6] (pid 9082)
> [...]
> lrmd: [8946]: info: operation monitor[112] on vm-fir for client 8949: pid 12015 exited with return code 7
> crmd: [8949]: info: process_lrm_event: LRM operation vm-fir_monitor_0 (call=112, rc=7, cib-update=130, confirmed=true) not running
> attrd: [8947]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true)
> attrd: [8947]: notice: attrd_perform_update: Sent update 4: probe_complete=true
> 
> Then I cleaned up some resources running on other nodes, which resulted
> in those showing up in the crm status output providing log lines like eg.:
> 
> crmd: [8949]: WARN: status_from_rc: Action 4 (vm-web5_monitor_0) on n02 failed (target: 7 vs. rc: 0): Error
> 
> Finally, I exited maintenance mode, and Pacemaker started every resource
> I did not clean up beforehand, concurrently with their already running
> instances:
> 
> pengine: [8948]: notice: LogActions: Start   vm-web9#011(n03)
> 
> I can provide more logs if this behavior is indeed unexpected, but it
> looks more like I miss the exact concept of maintenance mode.
> 
>> The discovery usually happens at the point the cluster is started on a node.
> 
> A local discovery did happen, but it could not find anything, as the
> cluster was started by the init scripts, well before any resource could
> have been moved to the freshly rebooted node (manually, to free the next
> node for rebooting).

Thats your problem then, you've started resources outside of the control of the cluster.
Two options... recurring monitor actions with role=Stopped would have caught this or you can run crm_resource --cleanup after you've moved resources around.

> 
>> Maintenance mode just prevents the cluster from doing anything about it.
> 
> Fine.  So I should have restarted Pacemaker on each node before leaving
> maintenance mode, right?  Or is there a better way?

See above

>  (Unfortunately, I
> could not manage the rolling reboot through Pacemaker, as some DLM/cLVM
> freeze made the cluster inoperable in its normal way.)
> 
>>> but it probably did not happen, as the cluster started up resources
>>> already running on other nodes, which is generally forbidden.  Given
>>> that all resources were running (though possibly migrated during the
>>> maintenance), what would have been the correct way of bringing the
>>> cluster out of maintenance mode?  This should have required no
>>> resource actions at all.  Would cleanup of all resources have helped?
>>> Or is there a better way?
> 
> You say in the above thread that resource definitions can be changed:
> http://thread.gmane.org/gmane.linux.highavailability.user/39121/focus=39437
> Let me quote from there (starting with the words of Ulrich Windl):
> 
>>>>> I think it's a common misconception that you can modify cluster
>>>>> resources while in maintenance mode:
>>>> 
>>>> No, you _should_ be able to.  If that's not the case, its a bug.
>>> 
>>> So the end of maintenance mode starts with a "re-probe"?
>> 
>> No, but it doesn't need to.  
>> The policy engine already knows if the resource definitions changed
>> and the recurring monitor ops will find out if any are not running.
> 
> My experiences show that you may not *move around* resources while in
> maintenance mode.

Correct

>  That would indeed require a cluster-wide re-probe,
> which does not seem to happen (unless forced some way).  Probably there
> was some misunderstanding in the above discussion, I guess Ulrich meant
> moving resources when he wrote "modifying cluster resources".  Does this
> make sense?

No, I've reasonably sure he meant changing their definitions in the cib.
Or at least thats what I thought he meant at the time.

> -- 
> Thanks,
> Feri.
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail
-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster