Hi Daniel, I am happy that Libvirt is pushing local migration/live patching support, but at the same time I am wondering what changed from what you said here: https://www.redhat.com/archives/libvir-list/2017-September/msg00489.html To give you a background, we have live patching enhancements in IBM backlog since a few years ago, and one on the reasons these were being postponed time and time again were the lack of Libvirt support and this direction of "Libvirt is not interested in supporting it". And this message above was being used internally as the rationale for it. Thanks, DHB On 2/3/20 9:43 AM, Daniel P. Berrangé wrote:
I'm (re-)sending this patch series on behalf of Shaju Abraham <shaju.abraham@xxxxxxxxxxx> who has tried to send this several times already. Red Hat's email infrastructure is broken, accepting the mails and then failing to deliver them to mailman, or any other Red Hat address. Unfortunately it means that while we can send comments back to Shaju on this thread, subscribers will then probably fail to see any responses Shaju tries to give :-( To say this is bad is an understatement. I have yet another ticket open tracking & escalating this awful problem but can't give any ETA on a fix :-( Anyway, with that out of the way, here's Shaju's original cover letter below.... 1) What is this patch series about? Local live migration of a VM is about Live migrating a VM instance with in the same node. Traditional libvirt live migration involves migrating the VM from a source node to a remote node. The local migrations are forbidden in Libvirt for a myriad of reasons. This patch series is to enable local migration in Libvirt. 2) Why Local Migration is important? The ability to Live migrate a VM locally paves the way for hypervisor upgrades without shutting down the VM. For example to upgrade qemu after a security upgrade, we can locally migrate the VM to the new qemu instance. By utilising capabilities like "bypass-shared-memory" in qemu, the hypervisor upgrades are faster. 3) Why is local migration difficult in Libvirt? Libvirt always assumes that the name/UUID pair is unique with in a node. During local migration there will be two different VMs with the same UUID/name pair which will confuse the management stack. There are other path variables like monitor path, config paths etc which assumes that the name/UUID pair is unique. So during migration the same monitor will be used by both the source and the target. We cannot assign a temporary UUID to the target VM, since UUID is a part of the machine ABI which is immutable. To decouple the dependecy on UUID/name, a new field (the domain id) is included in all the PATHs that Libvirt uses. This will ensure that all instances of the VM gets a unique PATH. 4) How is the Local Migration Designed ? Libvirt manages all the VM domain objects using two hash tables which are indexed using either the UUID or Name.During the Live migration the domain entry in the source node gets deleted and a new entry gets populated in the target node, which are indexed using the same name/UUID.But for the Local migration, there is no remote node. Both the source and the target nodes are same. So inorder to model the remote node, two more hashtables are introduced which represents the hash tables of the remote node during migration. The Libvirt migration involves 5 stages 1) Begin 2) Prepare 3) Perform 4) Finish 5) Confirm Begin,Perform and Confirm gets executed on the source node where as Prepare and Finish gets executed on the target node. In the case of Local Migration Perform and Finish stages uses the newly introduced 'remote hash table' and rest of the stages uses the 'source hash tables'. Once the migration is completed, that is after the confirm phase, the VM domain object is moved from the 'remote hash table' to the 'source hash table'. This is required so that other Libvirt commands like 'virsh list' can display all the VMs running in the node. 5) How to test Local Migration? A new flag 'local' is added to the 'virsh migrate' command to enable local migration. The syntax is virsh migrate --live --local 'domain-id' qemu+ssh://ip-address/system 6) What are the known issues? SeLinux policies is know to have issues with the creating /dev/hugepages entries during VM launch. In order to test local migration disable SeLinux using 'setenforce 0'. Shaju Abraham (6): Add VIR_MIGRATE_LOCAL flag to virsh migrate command Introduce remote hash tables and helper routines Add local migration support in QEMU Migration framework Modify close callback routines to handle local migration Make PATHs unique for a VM object instance Move the domain object from remote to source hash table include/libvirt/libvirt-domain.h | 6 + src/conf/virdomainobjlist.c | 232 +++++++++++++++++++++++++++++-- src/conf/virdomainobjlist.h | 10 ++ src/libvirt_private.syms | 4 + src/qemu/qemu_conf.c | 4 +- src/qemu/qemu_domain.c | 28 +++- src/qemu/qemu_domain.h | 2 + src/qemu/qemu_driver.c | 46 +++++- src/qemu/qemu_migration.c | 59 +++++--- src/qemu/qemu_migration.h | 5 + src/qemu/qemu_migration_cookie.c | 121 ++++++++-------- src/qemu/qemu_migration_cookie.h | 2 + src/qemu/qemu_process.c | 3 +- src/qemu/qemu_process.h | 2 + src/util/virclosecallbacks.c | 48 +++++-- src/util/virclosecallbacks.h | 3 + tools/virsh-domain.c | 7 + 17 files changed, 471 insertions(+), 111 deletions(-)