Hi On Wednesday 19 of March 2014 21:26:56 Digimer wrote: > On 19/03/14 07:45 PM, Digimer wrote: > > On 19/03/14 06:31 PM, Chris Feist wrote: > >> On 03/18/2014 08:27 PM, Digimer wrote: > >>> Hi all, > >>> > >>> I would like to tell rgmanager to give more time for VMs to stop. I > >>> > >>> want this: > >>> > >>> <vm name="vm01-win2008" domain="primary_n01" autostart="0" > >>> path="/shared/definitions/" exclusive="0" recovery="restart" > >>> max_restarts="2" > >>> restart_expire_time="600"> > >>> > >>> <action name="stop" timeout="10m" /> > >>> > >>> </vm> > >>> > >>> I already use ccs to create the entry: > >>> > >>> <vm name="vm01-win2008" domain="primary_n01" autostart="0" > >>> path="/shared/definitions/" exclusive="0" recovery="restart" > >>> max_restarts="2" > >>> restart_expire_time="600"/> > >>> > >>> via: > >>> > >>> ccs -h localhost --activate --sync --password "secret" \ > >>> > >>> --addvm vm01-win2008 \ > >>> --domain="primary_n01" \ > >>> path="/shared/definitions/" \ > >>> autostart="0" \ > >>> exclusive="0" \ > >>> recovery="restart" \ > >>> max_restarts="2" \ > >>> restart_expire_time="600" > >>> > >>> I'm hoping it's a simple additional switch. :) > >> > >> Unfortunately currently ccs doesn't support setting resource actions. > >> However it's my understanding that rgmanager doesn't check timeouts > >> unless __enforce_timeouts is set to "1". So you shouldn't be seeing a > >> vm resource go to failed if it takes a long time to stop. Are you > >> trying to make the vm resource fail if it takes longer than 10 minutes > >> to stop? > > > > I was afraid you were going to say that. :( > > > > The problem is that after calling 'disable' against the VM service, > > rgmanager waits two minutes. If the service isn't closed in that time, > > the server is forced off (at least, this was the behaviour when I last > > tested this). > > > > The concern is that, by default, windows installs queue updates to > > install when the system shuts down. During this time, windows makes it > > very clear that you should not power off the system during the updates. > > So if this timer is hit, and the VM is forced off, the guest OS can be > > damaged. > > > > Of course, we can debate the (lack of) wisdom of this behaviour, and I > > already document this concern (and even warn people to check for updates > > before stopping the server), it's not sufficient. If a user doesn't read > > the warning, or simply forgets to check, the consequences can be > > non-trivial. > > > > If ccs can't be made to add this attribute, and if the behaviour > > persists (I will test shortly after sending this reply), then I will > > have to edit the cluster.conf directly, something I am loath to do if at > > all avoidable. > > > > Cheers > > Confirmed; > > I called disable on a VM with gnome running, so that I could abort the > VM's shut down. > > an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date > Wed Mar 19 21:06:29 EDT 2014 > Local machine disabling vm:vm01-rhel6...Success > Wed Mar 19 21:08:36 EDT 2014 > > 2 minutes and 7 seconds, then rgmanager forced-off the VM. Had this been > a windows guest in the middle of installing updates, it would be highly > likely to be screwed now. Is this really the best way to handle such an event? >From what I remember, Windows can (or could, I don't have any 'modern' windows laying around) be told to shutdown without updating. maybe a wiser approach would be to make the stop event (which I believe is delivered to the guest as pressing the ACPI power button) trigger a shutdown without updates. keep in mind that doing system updates on timer is dangerous, irrelevant of the actual time regards Pavel Herrmann > To confirm, I changed the config to: > > <vm autostart="0" domain="primary_n01" exclusive="0" max_restarts="2" > name="vm01-rhel6" path="/shared/definitions/" recovery="restart" > restart_expire_time="600"> > <action name="stop" timeout="10m"/> > </vm> > > Then I repeated the test: > > an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date > Wed Mar 19 21:13:18 EDT 2014 > Local machine disabling vm:vm01-rhel6...Success > Wed Mar 19 21:23:31 EDT 2014 > > 10 minutes and 13 seconds before the cluster killed the server, much > less likely to interrupt a in-progress OS update (truth be told, I plan > to set 30 minutes. > > I understand that this blocks other processes, but in an HA environment, > I'd strongly argue that safe > speed. > > digimer -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster