Andrew Beekhof <andrew@xxxxxxxxxxx> writes: > On 27 Aug 2014, at 3:40 am, Ferenc Wagner <wferi@xxxxxxx> wrote: > >> Andrew Beekhof <andrew@xxxxxxxxxxx> writes: >> >>> On 22 Aug 2014, at 10:37 am, Ferenc Wagner <wferi@xxxxxxx> wrote: >>> >>>> While my Pacemaker cluster was in maintenance mode, resources were moved >>>> (by hand) between the nodes as I rebooted each node in turn. In the end >>>> the crm status output became perfectly empty, as the reboot of a given >>>> node removed from the output the resources which were located on the >>>> rebooted node at the time of entering maintenance mode. I expected full >>>> resource discovery on exiting maintenance mode, >> >> I experienced this with Pacemaker version 1.1.7. I know it's old >> and it suffers from crmd segfault on entering maintenance mode (cf. >> http://thread.gmane.org/gmane.linux.highavailability.user/39121), but >> works well generally so I did not get to upgrade it yet. Now that I >> mentioned the crmd segfault: I noted that it died on the DC when I >> entered maintenance mode: >> >> crmd: [7452]: info: te_rsc_command: Initiating action 64: cancel vm-tmvp_monitor_60000 on n01 (local) >> crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel. > > That looks like the lrmd died. It did not die, at least not fully. After entering maintenance mode crmd asked lrmd to cancel the recurring monitor ops for all resources: 08:40:18 crmd: [7452]: info: do_te_invoke: Processing graph 20578 (ref=pe_calc-dc-1408516818-30681) derived from /var/lib/pengine/pe-input-848.bz2 08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 17: cancel dlm:0_monitor_120000 on n04 08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 84: cancel dlm:0_cancel_120000 on n01 (local) 08:40:18 lrmd: [7449]: info: cancel_op: operation monitor[194] on dlm:0 for client 7452, its parameters: [...] cancelled 08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 50: cancel dlm:2_monitor_120000 on n02 The stream of monitor op cancellation messages ended with: 08:40:18 crmd: [7452]: info: te_rsc_command: Initiating action 71: cancel vm-mdssq_monitor_60000 on n01 (local) 08:40:18 lrmd: [7449]: info: cancel_op: operation monitor[329] on vm-mdssq for client 7452, its parameters: [...] cancelled 08:40:18 crmd: [7452]: info: process_lrm_event: LRM operation vm-mdssq_monitor_60000 (call=329, status=1, cib-update=0, confirmed=true) Cancelled 08:40:18 crmd: [7452]: notice: run_graph: ==== Transition 20578 (Complete=87, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-848.bz2): Complete 08:40:18 crmd: [7452]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] 08:40:18 pengine: [7451]: notice: process_pe_message: Transition 20578: PEngine Input stored in: /var/lib/pengine/pe-input-848.bz2 08:41:28 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=10000, abort_level=0, complete=true) 08:41:28 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition [these two lines repeated several times] 08:41:28 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=10000, abort_level=0, complete=true) 08:41:28 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition 08:41:38 crmd: [7452]: WARN: action_timer_callback: Timer popped (timeout=20000, abort_level=0, complete=true) 08:41:38 crmd: [7452]: WARN: action_timer_callback: Ignoring timeout while not in transition 08:48:05 cib: [7447]: info: cib_stats: Processed 159 operations (23207.00us average, 0% utilization) in the last 10min 08:55:18 crmd: [7452]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (900000ms) 08:55:18 crmd: [7452]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] 08:55:18 crmd: [7452]: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED 08:55:19 pengine: [7451]: notice: stage6: Delaying fencing operations until there are resources to manage 08:55:19 crmd: [7452]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] 08:55:19 crmd: [7452]: info: do_te_invoke: Processing graph 20579 (ref=pe_calc-dc-1408517718-30802) derived from /var/lib/pengine/pe-input-849.bz2 08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 17: cancel dlm:0_monitor_120000 on n04 08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 84: cancel dlm:0_cancel_120000 on n01 (local) 08:55:19 crmd: [7452]: info: cancel_op: No pending op found for dlm:0:194 08:55:19 lrmd: [7449]: info: on_msg_cancel_op: no operation with id 194 Interestingly, monitor[194], lastly mentioned by lrmd, was the very first cancelled operation. 08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 50: cancel dlm:2_monitor_120000 on n02 08:55:19 crmd: [7452]: info: te_rsc_command: Initiating action 83: cancel vm-cedar_monitor_60000 on n01 (local) 08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(673): failed to receive a reply message of getrsc. 08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel. 08:55:19 crmd: [7452]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel. 08:55:19 crmd: [7452]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel. 08:55:19 crmd: [7452]: ERROR: get_lrm_resource: Could not add resource vm-cedar to LRM 08:55:19 crmd: [7452]: ERROR: do_lrm_invoke: Invalid resource definition 08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <create_request_adv origin="te_rsc_command" t="crmd" version="3.0.6" subt="request" reference="lrm_invoke-tengine-1408517719-30807" crm_task="lrm_invoke" crm_sys_to="lrmd" crm_sys_from="tengine" crm_host_to="n01" > 08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <crm_xml > 08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <rsc_op id="83" operation="cancel" operation_key="vm-cedar_monitor_60000" on_node="n01" on_node_uuid="n01" transition-key="83:20579:0:1b0a6e79-af5a-41e4-8ced-299371e7922c" > 08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <primitive id="vm-cedar" long-id="vm-cedar" class="ocf" provider="niif" type="TransientDomain" /> 08:55:19 crmd: [7452]: ERROR: log_data_element: Output truncated: available=727, needed=1374 08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input <attributes CRM_meta_call_id="195" [really very long] 08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input </rsc_op> 08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input </crm_xml> 08:55:19 crmd: [7452]: WARN: do_lrm_invoke: bad input </create_request_adv> Blocks of messages like the above repeat a couple of times for other resources, then crmd kicks the bucket and gets restarted: 08:55:19 corosync[6966]: [pcmk ] info: pcmk_ipc_exit: Client crmd (conn=0x1dc6ea0, async-conn=0x1dc6ea0) left 08:55:19 pacemakerd: [7443]: WARN: Managed crmd process 7452 killed by signal 11 [SIGSEGV - Segmentation violation]. 08:55:19 pacemakerd: [7443]: notice: pcmk_child_exit: Child process crmd terminated with signal 11 (pid=7452, rc=0) 08:55:19 pacemakerd: [7443]: notice: pcmk_child_exit: Respawning failed child process: crmd 08:55:19 pacemakerd: [7443]: info: start_child: Forked child 13794 for process crmd 08:55:19 corosync[6966]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2) 08:55:19 crmd: [13794]: info: Invoked: /usr/lib/pacemaker/crmd Anyway, no further logs from lrmd after this point until hours later I rebooted the machine: 14:37:06 pacemakerd: [7443]: notice: stop_child: Stopping lrmd: Sent -15 to process 7449 14:37:06 lrmd: [7449]: info: lrmd is shutting down 14:37:06 pacemakerd: [7443]: info: pcmk_child_exit: Child process lrmd exited (pid=7449, rc=0) So lrmd was alive all the time. > Which created a condition in the crmd that it couldn't handle so it > crashed too. Maybe their connection got severed somehow. >> However, it got restarted seamlessly, without the node being fenced, so >> I did not even notice this until now. Should this have resulted in the >> node being fenced? > > Depends how fast the node can respawn. You mean how fast crmd can respawn? How much time does it have to respawn to avoid being fenced? >> crmd: [13794]: ERROR: verify_stopped: Resource vm-web5 was active at shutdown. You may ignore this error if it is unmanaged. > > In maintenance mode, everything is unmanaged. So that would be expected. Is maintenance mode the same as unmanaging all resources? I think the latter does not cancel the monitor operations here... >>> The discovery usually happens at the point the cluster is started on >>> a node. >> >> A local discovery did happen, but it could not find anything, as the >> cluster was started by the init scripts, well before any resource could >> have been moved to the freshly rebooted node (manually, to free the next >> node for rebooting). > > Thats your problem then, you've started resources outside of the > control of the cluster. Some of them, yes, and moved the rest between the nodes. All this circumventing the cluster. > Two options... recurring monitor actions with role=Stopped would have > caught this Even in maintenance mode? Wouldn't they have been cancelled just like the ordinary recurring monitor actions? I guess adding them would run a recurring monitor operation for every resource on every node, only with different expectations, right? > or you can run crm_resource --cleanup after you've moved resources around. I actually ran some crm resource cleanups for a couple of resources, and those really were not started on exiting maintenance mode. >>> Maintenance mode just prevents the cluster from doing anything about it. >> >> Fine. So I should have restarted Pacemaker on each node before leaving >> maintenance mode, right? Or is there a better way? > > See above So crm_resource -r whatever -C is the way, for each resource separately. Is there no way to do this for all resources at once? >> You say in the above thread that resource definitions can be changed: >> http://thread.gmane.org/gmane.linux.highavailability.user/39121/focus=39437 >> Let me quote from there (starting with the words of Ulrich Windl): >> >>>>>> I think it's a common misconception that you can modify cluster >>>>>> resources while in maintenance mode: >>>>> >>>>> No, you _should_ be able to. If that's not the case, its a bug. >>>> >>>> So the end of maintenance mode starts with a "re-probe"? >>> >>> No, but it doesn't need to. >>> The policy engine already knows if the resource definitions changed >>> and the recurring monitor ops will find out if any are not running. >> >> My experiences show that you may not *move around* resources while in >> maintenance mode. > > Correct > >> That would indeed require a cluster-wide re-probe, which does not >> seem to happen (unless forced some way). Probably there was some >> misunderstanding in the above discussion, I guess Ulrich meant moving >> resources when he wrote "modifying cluster resources". Does this >> make sense? > > No, I've reasonably sure he meant changing their definitions in the cib. > Or at least thats what I thought he meant at the time. Nobody could blame you for that, because that's what it means. But then he inquired about a "re-probe", which fits more the problem of changing the status of resources, not their definition. Actually, I was so firmly stuck in this mind set, that first I wanted to ask you to reconsider, your response felt so much out of place. That's all about history for now... After all this, I suggest to clarify this issue in the fine manual. I've read it a couple of times, and still got the wrong impression. -- Regards, Feri. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster