On 26.07.24 07:02, Tian, Kevin wrote:
From: David Hildenbrand <david@xxxxxxxxxx>
Sent: Thursday, July 25, 2024 10:04 PM
Open
====
Implementing a RamDiscardManager to notify VFIO of page conversions
causes changes in semantics: private memory is treated as discarded (or
hot-removed) memory. This isn't aligned with the expectation of current
RamDiscardManager users (e.g. VFIO or live migration) who really
expect that discarded memory is hot-removed and thus can be skipped
when
the users are processing guest memory. Treating private memory as
discarded won't work in future if VFIO or live migration needs to handle
private memory. e.g. VFIO may need to map private memory to support
Trusted IO and live migration for confidential VMs need to migrate
private memory.
"VFIO may need to map private memory to support Trusted IO"
I've been told that the way we handle shared memory won't be the way
this is going to work with guest_memfd. KVM will coordinate directly
with VFIO or $whatever and update the IOMMU tables itself right in the
kernel; the pages are pinned/owned by guest_memfd, so that will just
work. So I don't consider that currently a concern. guest_memfd private
memory is not mapped into user page tables and as it currently seems it
never will be.
Or could extend MAP_DMA to accept guest_memfd+offset in place of
'vaddr' and have VFIO/IOMMUFD call guest_memfd helpers to retrieve
the pinned pfn.
In theory yes, and I've been thinking of the same for a while. Until
people told me that it is unlikely that it will work that way in the future.
IMHO it's more the TIO arch deciding whether VFIO/IOMMUFD needs
to manage the mapping of the private memory instead of the use of
guest_memfd.
e.g. SEV-TIO, iiuc, introduces a new-layer page ownership tracker (RMP)
to check the HPA after the IOMMU walks the existing I/O page tables.
So reasonably VFIO/IOMMUFD could continue to manage those I/O
page tables including both private and shared memory, with a hint to
know where to find the pfn (host page table or guest_memfd).
But TDX Connect introduces a new I/O page table format (same as secure
EPT) for mapping the private memory and further requires sharing the
secure-EPT between CPU/IOMMU for private. Then it appears to be
a different story.
Yes. This seems to be the future and more in-line with
in-place/in-kernel conversion as e.g., pKVM wants to have it. If you
want to avoid user space altogether when doing shared<->private
conversions, then letting user space manage the IOMMUs is not going to work.
If we ever have to go down that path (MAP_DMA of guest_memfd), we could
have two RAMDiscardManager for a RAM region, just like we have two
memory backends: one for shared memory populate/discard (what this
series tries to achieve), one for private memory populate/discard.
The thing is, that private memory will always have to be special-cased
all over the place either way, unfortunately.
--
Cheers,
David / dhildenb