Re: KVM+Ceph: Live migration of I/O-heavy VM

Ronny Aasen <ronny+ceph-users@xxxxxxxx> · Tue, 11 Dec 2018 18:19:35 +0100

On 11.12.2018 12:59, Kevin Olbrich wrote:
Hi!

Currently I plan a migration of a large VM (MS Exchange, 300 Mailboxes
and 900GB DB) from qcow2 on ext4 (RAID1) to an all-flash Ceph luminous
cluster (which already holds lot's of images).
The server has access to both local and cluster-storage, I only need
to live migrate the storage, not machine.

I have never used live migration as it can cause more issues and the
VMs that are already migrated, had planned downtime.
Taking the VM offline and convert/import using qemu-img would take
some hours but I would like to still serve clients, even if it is
slower.

The VM is I/O-heavy in terms of the old storage (LSI/Adaptec with
BBU). There are two HDDs bound as RAID1 which are constantly under 30%
- 60% load (this goes up to 100% during reboot, updates or login
prime-time).

What happens when either the local compute node or the ceph cluster
fails (degraded)? Or network is unavailable?
Are all writes performed to both locations? Is this fail-safe? Or does
the VM crash in worst case, which can lead to dirty shutdown for MS-EX
DBs?

the disk is on the source location untill the migration is finalized. if 
the local compute node crashed and the vm dies with it before the 
migration is done. the disk is on the source location as expected.  if 
nodes on the ceph cluster dies but the cluster is operational, ceph just 
selfheal and the migration is finished. if the cluster dies hard enough 
to actually break, the migration will timeout , and abort. and disk 
remains on source location. if network is unavailable the transfer will 
also timeout.

good luck

Ronny Aasen

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com