Re: Adding a stop timeout to a VM service using 'ccs'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19/03/14 07:45 PM, Digimer wrote:
On 19/03/14 06:31 PM, Chris Feist wrote:
On 03/18/2014 08:27 PM, Digimer wrote:
Hi all,

   I would like to tell rgmanager to give more time for VMs to stop. I
want this:

<vm name="vm01-win2008" domain="primary_n01" autostart="0"
path="/shared/definitions/" exclusive="0" recovery="restart"
max_restarts="2"
restart_expire_time="600">
   <action name="stop" timeout="10m" />
</vm>

I already use ccs to create the entry:

<vm name="vm01-win2008" domain="primary_n01" autostart="0"
path="/shared/definitions/" exclusive="0" recovery="restart"
max_restarts="2"
restart_expire_time="600"/>

via:

ccs -h localhost --activate --sync --password "secret" \
  --addvm vm01-win2008 \
  --domain="primary_n01" \
  path="/shared/definitions/" \
  autostart="0" \
  exclusive="0" \
  recovery="restart" \
  max_restarts="2" \
  restart_expire_time="600"

I'm hoping it's a simple additional switch. :)

Unfortunately currently ccs doesn't support setting resource actions.
However it's my understanding that rgmanager doesn't check timeouts
unless __enforce_timeouts is set to "1".  So you shouldn't be seeing a
vm resource go to failed if it takes a long time to stop.  Are you
trying to make the vm resource fail if it takes longer than 10 minutes
to stop?

I was afraid you were going to say that. :(

The problem is that after calling 'disable' against the VM service,
rgmanager waits two minutes. If the service isn't closed in that time,
the server is forced off (at least, this was the behaviour when I last
tested this).

The concern is that, by default, windows installs queue updates to
install when the system shuts down. During this time, windows makes it
very clear that you should not power off the system during the updates.
So if this timer is hit, and the VM is forced off, the guest OS can be
damaged.

Of course, we can debate the (lack of) wisdom of this behaviour, and I
already document this concern (and even warn people to check for updates
before stopping the server), it's not sufficient. If a user doesn't read
the warning, or simply forgets to check, the consequences can be
non-trivial.

If ccs can't be made to add this attribute, and if the behaviour
persists (I will test shortly after sending this reply), then I will
have to edit the cluster.conf directly, something I am loath to do if at
all avoidable.

Cheers

Confirmed;

I called disable on a VM with gnome running, so that I could abort the VM's shut down.

an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date
Wed Mar 19 21:06:29 EDT 2014
Local machine disabling vm:vm01-rhel6...Success
Wed Mar 19 21:08:36 EDT 2014

2 minutes and 7 seconds, then rgmanager forced-off the VM. Had this been a windows guest in the middle of installing updates, it would be highly likely to be screwed now.

To confirm, I changed the config to:

<vm autostart="0" domain="primary_n01" exclusive="0" max_restarts="2" name="vm01-rhel6" path="/shared/definitions/" recovery="restart" restart_expire_time="600">
  <action name="stop" timeout="10m"/>
</vm>

Then I repeated the test:

an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date
Wed Mar 19 21:13:18 EDT 2014
Local machine disabling vm:vm01-rhel6...Success
Wed Mar 19 21:23:31 EDT 2014

10 minutes and 13 seconds before the cluster killed the server, much less likely to interrupt a in-progress OS update (truth be told, I plan to set 30 minutes.

I understand that this blocks other processes, but in an HA environment, I'd strongly argue that safe > speed.

digimer

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without access to education?

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster




[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux