Re: [LSF/MM/BPF TOPIC] BoF VM live migration over CXL memory​

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Gregory-


On 4/12/23 11:34, Gregory Price wrote:
On Wed, Apr 12, 2023 at 05:50:55PM +0200, David Hildenbrand wrote:

long-term: possibly forever, controlled by user space. In practice, anything
longer than ~10 seconds ( best guess :) ). There can be long-term pinnings
that are of very short duration, we just don't know what user space is up to
and when it will decide to unpin.

Assume user space requests to trigger read/write of a user space page to a
file: the page is pinned, DMA is started, once DMA completes the page is
unpinned. Short-term. User space does not control how long the page remains
pinned.

In contrast:

Example #1: mapping VM guest memory into an IOMMU using vfio for PCI
passthrough requires pinning the pages. Until user space decides to unmap
the pages from the IOMMU, the pages will remain pinned. -> long-term

Example #2: mapping a user space address range into an IOMMU to repeatedly
perform RDMA using that address range requires pinning the pages. Until user
space decides to unregister that range, the pages remain pinned. ->
long-term

Example #3: registering a user space address range with io_uring as a fixed
buffer, such that io_uring OPS can avoid the page table walks by simply
using the pinned pages that were looked up once. As long as the fixed buffer
remains registered, the pages stay pinned. -> long-term

--
Thanks,

David / dhildenb


That pretty much precludes live migration from using CXL as a transport
mechanism, since live migration would be a user-initiated process, you
would need what amounts to an atomic move between hosts to ensure pages
are not left pinned.

Do you really need an atomic-in-between-hots? I mean, it's not really a failure if you are in the process of migrating pages onto the switched cxl memory memory and one of the pages is pulled out of cxl and back on the hypervisor. The running VM cpu can do loads and stores from either. So it's running, it's not affected. It's just that your migration is potentially "stalled" or "canceled". You only encounter issues when all your pages are on cxl and the other hypervisor is pulling pages out.


The more i'm reading the more i'm somewhat convinced CXL memory should
not allow pinning at all.

I think you want to be able to somehow pin the pages on one hypervisor and unpin them on the other hypervisor. Or in some other way "pass ownership" between the hypervisor. Right? Because of the scenario I mention above, if your source hypervisor takes a page out of cxl, then your destination hypervisor has a hole in VMs address space and can't run it.

I suppose you could implement a new RDMA feature where the remote host's
CXL memory is temporarily mapped, data is migrated, and then that area
is unmapped. Basically the exact same RDMA mechanism, but using memory
instead of network. This would make the operation a kernel-controlled
if pin/unpin is required.

That would move us from the shared memory in the CXL 3 spec into the sections on direct memory placement I think. Which in order of preference is a #2 for me personally and a "backup" plan if #1 shared memory doesn't pan out.




Lots to talk about.

~Gregory


--
--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux