On 28 Aug 2014, at 3:09 am, Ferenc Wagner <wferi@xxxxxxx> wrote: > Andrew Beekhof <andrew@xxxxxxxxxxx> writes: > >> On 27 Aug 2014, at 3:40 am, Ferenc Wagner <wferi@xxxxxxx> wrote: > >>> However, it got restarted seamlessly, without the node being fenced, so >>> I did not even notice this until now. Should this have resulted in the >>> node being fenced? >> >> Depends how fast the node can respawn. > > You mean how fast crmd can respawn? How much time does it have to > respawn to avoid being fenced? Until a new node can be elected DC, invoke the policy engine and start fencing. > >>> crmd: [13794]: ERROR: verify_stopped: Resource vm-web5 was active at shutdown. You may ignore this error if it is unmanaged. >> >> In maintenance mode, everything is unmanaged. So that would be expected. > > Is maintenance mode the same as unmanaging all resources? I think the > latter does not cancel the monitor operations here... Right. One cancels monitor operations too. > >>>> The discovery usually happens at the point the cluster is started on >>>> a node. >>> >>> A local discovery did happen, but it could not find anything, as the >>> cluster was started by the init scripts, well before any resource could >>> have been moved to the freshly rebooted node (manually, to free the next >>> node for rebooting). >> >> Thats your problem then, you've started resources outside of the >> control of the cluster. > > Some of them, yes, and moved the rest between the nodes. All this > circumventing the cluster. > >> Two options... recurring monitor actions with role=Stopped would have >> caught this > > Even in maintenance mode? Wouldn't they have been cancelled just like > the ordinary recurring monitor actions? Good point. Perhaps they wouldn't. > > I guess adding them would run a recurring monitor operation for every > resource on every node, only with different expectations, right? > >> or you can run crm_resource --cleanup after you've moved resources around. > > I actually ran some crm resource cleanups for a couple of resources, and > those really were not started on exiting maintenance mode. > >>>> Maintenance mode just prevents the cluster from doing anything about it. >>> >>> Fine. So I should have restarted Pacemaker on each node before leaving >>> maintenance mode, right? Or is there a better way? >> >> See above > > So crm_resource -r whatever -C is the way, for each resource separately. > Is there no way to do this for all resources at once? I think you can just drop the -r > >>> You say in the above thread that resource definitions can be changed: >>> http://thread.gmane.org/gmane.linux.highavailability.user/39121/focus=39437 >>> Let me quote from there (starting with the words of Ulrich Windl): >>> >>>>>>> I think it's a common misconception that you can modify cluster >>>>>>> resources while in maintenance mode: >>>>>> >>>>>> No, you _should_ be able to. If that's not the case, its a bug. >>>>> >>>>> So the end of maintenance mode starts with a "re-probe"? >>>> >>>> No, but it doesn't need to. >>>> The policy engine already knows if the resource definitions changed >>>> and the recurring monitor ops will find out if any are not running. >>> >>> My experiences show that you may not *move around* resources while in >>> maintenance mode. >> >> Correct >> >>> That would indeed require a cluster-wide re-probe, which does not >>> seem to happen (unless forced some way). Probably there was some >>> misunderstanding in the above discussion, I guess Ulrich meant moving >>> resources when he wrote "modifying cluster resources". Does this >>> make sense? >> >> No, I've reasonably sure he meant changing their definitions in the cib. >> Or at least thats what I thought he meant at the time. > > Nobody could blame you for that, because that's what it means. But then > he inquired about a "re-probe", which fits more the problem of changing > the status of resources, not their definition. Actually, I was so > firmly stuck in this mind set, that first I wanted to ask you to > reconsider, your response felt so much out of place. That's all about > history for now... > > After all this, I suggest to clarify this issue in the fine manual. > I've read it a couple of times, and still got the wrong impression. Which specific section do you suggest?
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster