Re: Live migration support for Cloud-Hypervisor VMs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Thanks for the details and recommendations Daniel!!


On 8/2/2022 11:19 AM, Daniel P. Berrangé wrote:
On Mon, Aug 01, 2022 at 11:03:49AM -0500, Praveen K Paladugu wrote:
Folks,

We are implementing Live Migration support in "ch" driver of Libvirt. I'd
like to confirm if the approach we have chosen would be accepted upstream
once implemented.


Our immediate goal is to implement "Hypervisor Native" + "Managed Direct"
mode of migration. "Hypervisor Native" here referring to VMM(ch) being
responsible for data flow. This in contrast to TUNNELED migration where data
is sent over libvirt rpc.

Avoiding TUNNELLED migration is a very good idea. This was a short term
hack to workaround the lack of TLS support in QEMU. It is more efficient
to have TLS natively integrated in the hypervisor layer than libvirt.

IOW, "Hypervisor native" is a good choice.


"Managed Direct" referring to virsh client responsible for control flow
between source and dest hosts. The libvirtd daemons on source and
destination do not have to communicate with each other. These modes are
described further at
https://libvirt.org/migration.html#network-data-transports.

I'd caution that I think 'managed direct' migration leaves you with
fewer nice options for ensuring resilience of the migration.

IOW, if the client application goes away, I think it'll be harder
for the libvirt CH driver to recover from that scenario.

Also if a client app is using the DigitalOcean 'go-libvirt' API
instead of our 'libvit-go-module' API, things are even more
limited since thg 'go-libvirt' API directly speaks to the RPC
protocol, bypassing libvirt.so logic related to migration
process steps.

With the peer-to-peer mode, migration can carry on even if the
client app goes away, since the client app isn't a part of the
control loop.

So overall, I'd encourage peer-to-peer migration as the preferrable
option, unless you can hand-off absolutely everything to the CH
code and not have libvirt involved in orchestrating the migration
steps at all ?
Makes sense to prioritize peer-to-peer migration. Our current project is an internship and has strict time constraints. As we are well under way for "Managed Direct" mode, we will finish this and focus on peer-to-peer migration mode right after.
At the moment, Cloud-Hypervisor supports receiving migration data only on
Unix Domain Sockets. Also, Cloud-Hypervisor does not encrypt the VM data
while sending.

Hmm, that's quite limiting.


We are considering forking "socat" processes as documented at https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/docs/live_migration.md.
The socat processes will be forked in "Prepare" and "Perform" phases on
Destination and Source hosts respectively.

I couldn't find any existing implementation in libvirt to connect Domain
Sockets on different hosts. Please let me know, if you'd recommend a
different approach from forking socat processes to connect Domain Sockets on
source and dest hosts to allow Live VM Migration.

I think building something around socat will get you going quickly, but
ultimately be harmful over the long term.
Makes sense. We were also concerned about long term maintenance so wanted to check on this mailing list. As there isn't better mechanism to connect domain sockets on source and dest hosts, we will finish up the "socat" based implementation and get it to work end-to-end.

Our experiance with QEMU has been that to maximise performance you need
the lowest level in full control. These days QEMU can open multiple TCP
connections concurrently from multiple, so that throughput isn't limited
by data copy performance of a single CPU. It also has ability to take
advantage of kernel features like zerocopy. Use of an socat proxy is
going to add many data copies to the transport which can only harm your
performance.

So my recommendation would be to invest time in first extending CH so
that it natively supports opening TCP connections, and then take advantage
of that in libvirt from the start. You then have the basic foundation
right on which to add stuff like TLS, zerocopy, multi-conection, and more


Again, thanks for the details and the recommendation. Enabling TCP connections and other low-level features in cloud-hypervisor isn't something we can tackle within our current time constraints. But will follow up with cloud-hypervisor community and open a tracking issue for this work.

With regards,
Daniel

--
Regards,
Praveen K Paladugu





[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux