Re: Time to work on a Plan: Moving VM's

Kevin Fenzi <kevin@xxxxxxxxx> · Fri, 7 Aug 2015 10:49:39 -0600

On Thu, 6 Aug 2015 13:49:41 -0600
Stephen John Smoogen <smooge@xxxxxxxxx> wrote:

> We have new hardware in to replace some of our 4+ year old IBM x3650's
> and need to do so in the next month or so to make sure we have a good
> list of hardware to go onto extended warranty this fall.
> 
> I would like to come up with a plan of attack on getting them all
> moved by September 1st to virthost19->virthost22.

...snip...

> There are a couple of ways we do these transitions.
> 
> 1) Spin up a new virtual machine with an incremented hostname:
> Example:
> a) check to see which ask systems exist.  (ask01, ask02)
> b) create a new virtual machine with an incremented number: ask03
> c) ansible the system to be clone of ask01
> d) either turn off ask01 and rename ask03 to be ask01
> OR
> d) configure other servers to point to ask03 instead of ask01
> e) fix problems as needed
> f) shutdown and remove ask01.
> 
> 2) Move virtual machine to another server.
> a) Schedule a downtime
> b) Shutdown the server
> c) network dd the lvm image to other server.
> d) copy over the /etc/libvirt/qemu/___.xml file over to other server.
> e) spin up server
> f) fix problems as needed
> g) remove files from old server
> 
> 3) If the image is on an iscsi share versus local disks...
> a) shutdown the image on server A.
> b) copy the xml files over to server B.
> c) get libvirt to see them.
> d) start the image on server B
> e) remove the xml files from server A.
> 
> It looks like none of the servers in question are on the iscsi share
> so we won't be able to do 3. [Unless there is one or two that are good
> candidates to be on the iscsi share... then a variant of 2 would be
> used.]
> 
> Downtimes except for the fas system will be in the 20 minute range.
> The fas database might be 1-3 hours due to a 100 GB image to be copied
> over and the usual 'what we have to reboot that because this was down?
> WHY?' problems we end up with.
> 
> Other plans and ideas can be replied to here.

There's a bit of a hybrid plan between 1 and 2 we could also use. 

All the virthost10 ones could just be moved anytime since they are
staging.

All of these: 

virthost05.phx2.fedoraproject.org bastion02.phx2.fedoraproject.org
virthost05.phx2.fedoraproject.org proxy01.phx2.fedoraproject.org
virthost06.phx2.fedoraproject.org ask01.phx2.fedoraproject.org
virthost06.phx2.fedoraproject.org notifs-web02.phx2.fedoraproject.org
virthost07.phx2.fedoraproject.org datagrepper02.phx2.fedoraproject.org
virthost07.phx2.fedoraproject.org elections01.phx2.fedoraproject.org
virthost07.phx2.fedoraproject.org nuancier01.phx2.fedoraproject.org
virthost08.phx2.fedoraproject.org ns03.phx2.fedoraproject.org
virthost09.phx2.fedoraproject.org fedocal01.phx2.fedoraproject.org
virthost09.phx2.fedoraproject.org nuancier02.phx2.fedoraproject.org

Have other active instances, so they could be stopped on those hosts
and new versions of them created by ansible. Should just be changing
the info in their host_vars to build on a new one and updating ssh host
keys. 

All of these: 

virthost07.phx2.fedoraproject.org hotness01.phx2.fedoraproject.org
virthost08.phx2.fedoraproject.org darkserver01.phx2.fedoraproject.org
virthost09.phx2.fedoraproject.org busgateway01.phx2.fedoraproject.org

Don't have other active instances, but I think a short outage while we
shut them down and rebuild on another host would probibly be ok. 

This one: 

virthost05.phx2.fedoraproject.org db-fas01.phx2.fedoraproject.org

however, I think we should make a db-fas02 and sync data and cut over
to it, so as to keep downtime low. 

kevin
Attachment:
pgp7jwzY8O2ZV.pgp

Description: OpenPGP digital signature
_______________________________________________
infrastructure mailing list
infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
https://admin.fedoraproject.org/mailman/listinfo/infrastructure