Network Stall when doing live migration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings.

I have a testbed setup of two stock Ubuntu22LTS libvirt Host installs using Shared Storage (MooseFS in this case, cuz it was readily available).

I have configured the MooseFS as a 'dir' pool on each machine and they are on the same mount /MFS using the MFS fusemount.

I am using an OVS bridge on each server to provide a live IP to the VMs. Each OVS installation is assigned to its own ethernet card and the two machines are on the same Cisco switch.

The Cisco switch is setup as a trunk with a VLAN, and Virsh connects the VM to the OVS instance with that VLAN tag

I can install and boot up individual VMs on each Host with no problem.

I can --offline migrate Domains from one host to the other using virsh migrate

Note the --unsafe which seems to be required and prevents me from using Cockpit for migration.


virsh migrate U22-TEST qemu+ssh://x.x.x.126/system --unsafe --offline --persistent  --undefinesource --abort-on-error

then

virsh start U22-TEST.   So that works fine.

So I am now trying a live migration using

virsh migrate U22-TEST qemu+ssh://x.x.x.126/system --unsafe --live --verbose --persistent  --undefinesource --abort-on-error

Which works as well. I see the migration percentage climbing up and at 100% the transfer occurs with the VM down on the source and up on the second host. virsh console works at that point.

However, there is always a 2-3 minute period after the VM migrates (i.e. comes up on the destination host) when the networking is dead.

After the 3 minute wait, the VM suddenly responds to a ping, ports are open etc.  Most of the time any SSH connections have timed out by then.

I assume this is some sort of arp issue, but where?  Libvirt, OVS, the Cisco switch

Is there some sort of additional step, flag, or even IOS config suggestion that I can use to limit the network downtime?

As minor secondary issue, is there some additional XML flag (<shared>) I can pass to the storage pool XML to indicate that it really is shared media and doesn't need the --unsafe flag


-wk





[Index of Archives]     [Linux Virtualization]     [KVM Development]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]     [Video 4 Linux]

  Powered by Linux