Re: Icehouse & Ceph -- live migration fails?

Daniel Schneller <daniel.schneller@xxxxxxxxxxxxxxxx> · Fri, 17 Oct 2014 08:28:17 +0000 (UTC)

samuel <samu60@...> writes:

> Hi all,This issue is also affecting us (centos6.5 based icehouse) and,
> as far as I could read, 
> comes from the fact that the path /var/lib/nova/instances (or whatever
> configuration path you have in nova.conf) is not shared. Nova does not
> see this shared path and therefore does not allow to perform live
> migrate although all the required information is stored in ceph and in
> the qemu local state.
> 
> Some people has "cheated" nova to see this as a shared path but I'm
> not confident about how 
> this will affect stability.
> 
> > 
> Can someone confirm this deduction? What are the possible workarounds
> for this situation in 
> a full ceph based environment (without shared path)?

I got it to work finally. Step 1 was double checking nova.conf on the
compute nodes. It was actually missing the flags pointed out earlier in
this thread.

As for the /var/lib/nova/instances data, this will get transferred to
the destination host as part of the migration. For that to work, you
need to have the transport between the libvirtd's set up correctly.

libvirt_live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,
VIR_MIGRATE_PEER2PEER,VIR _MIGRATE_LIVE,VIR_MIGRATE_PERSIST_DEST"
live_migration_uri=qemu+ssh://nova@%s/system?keyfile=/var/lib/nova/.ssh/
id_rsa

I did not want to open another TCP port on all the nodes, so I went with
the SSH based transport as described in the libvirtd documentation. For
some reason it would only work once I explicitly added the user account
(nova@...) and the location of the key file explicitly, even though the
locations and names are default.

As part of our deployment via Ansible we make sure the nova user has an
up to date list of host keys in /var/lib/nova/.ssh/known_hosts.
Otherwise you will get errors regarding failing host key verification in
/var/log/nova/nova-compute.log if you try to live migrate. Of course,
they user needs to be present everywhere, have the same key everywhere
and have that key's public part be in /var/lib/nova/.ssh/authorized_keys
for the login to work without user intervention.

Setting up this alone brought me almost to my goal, the only thing I had
missed was

vncserver_listen = 0.0.0.0

in nova.conf -- this address will be put into the virtual machines
libvirt.xml file as the address the machine uses for its VNC console.
While on the baremetal node where it was originally created, this works.
However, when the VM gets migrated to another host (basically copying
over the instance folder from /var/lib/nova/instances) this address
cannot be bound on the new baremetal host and the migration fails. The
log is pretty clear about that. Once I had changed the vncserver_listen,
new machines could be migrated immediately.

For existing ones, I have not tried if editing the libvirt.xml file
while they are running is in any way harmful, so I will wait until I can
shut them down for a short maintenance window, then edit the file to
replace the current listen address with 0.0.0.0 and bring them up again.

One more caveat: If you use the Horizon dashboard, there is a bug in the
Icehouse release that prevents successful live migration on another
level, because it uses the wrong names for the baremetal machines.
Instead of the compute service names (e. g. node01, node02 ... in my
case), it uses the fully qualified hypervisor names. This will not work.
See https://bugs.launchpad.net/horizon/+bug/1335999 for details.

I applied the corresponding patch from
https://git.openstack.org/cgit/openstack/horizon/patch/?
id=89dc7de2e87b8d4e35837ad5122117aa2fb2c520 (excluding the tests, those
do not match well enough). Now I can live migration from horizon and the
command line :)

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com