On 12/13/2024 3:08 PM, Chenyi Qiang wrote: > As the commit 852f0048f3 ("RAMBlock: make guest_memfd require > uncoordinated discard") highlighted, some subsystems like VFIO might > disable ram block discard. However, guest_memfd relies on the discard > operation to perform page conversion between private and shared memory. > This can lead to stale IOMMU mapping issue when assigning a hardware > device to a confidential VM via shared memory (unprotected memory > pages). Blocking shared page discard can solve this problem, but it > could cause guests to consume twice the memory with VFIO, which is not > acceptable in some cases. An alternative solution is to convey other > systems like VFIO to refresh its outdated IOMMU mappings. > > RamDiscardManager is an existing concept (used by virtio-mem) to adjust > VFIO mappings in relation to VM page assignment. Effectively page > conversion is similar to hot-removing a page in one mode and adding it > back in the other, so the similar work that needs to happen in response > to virtio-mem changes needs to happen for page conversion events. > Introduce the RamDiscardManager to guest_memfd to achieve it. > > However, guest_memfd is not an object so it cannot directly implement > the RamDiscardManager interface. > > One solution is to implement the interface in HostMemoryBackend. Any > guest_memfd-backed host memory backend can register itself in the target > MemoryRegion. However, this solution doesn't cover the scenario where a > guest_memfd MemoryRegion doesn't belong to the HostMemoryBackend, e.g. > the virtual BIOS MemoryRegion. > > Thus, choose the second option, i.e. define an object type named > guest_memfd_manager with RamDiscardManager interface. Upon creation of > guest_memfd, a new guest_memfd_manager object can be instantiated and > registered to the managed guest_memfd MemoryRegion to handle the page > conversion events. > > In the context of guest_memfd, the discarded state signifies that the > page is private, while the populated state indicated that the page is > shared. The state of the memory is tracked at the granularity of the > host page size (i.e. block_size), as the minimum conversion size can be > one page per request. > > In addition, VFIO expects the DMA mapping for a specific iova to be > mapped and unmapped with the same granularity. However, the confidential > VMs may do partial conversion, e.g. conversion happens on a small region > within a large region. To prevent such invalid cases and before any > potential optimization comes out, all operations are performed with 4K > granularity. > > Signed-off-by: Chenyi Qiang <chenyi.qiang@xxxxxxxxx> > --- > include/sysemu/guest-memfd-manager.h | 46 +++++ > system/guest-memfd-manager.c | 250 +++++++++++++++++++++++++++ > system/meson.build | 1 + > 3 files changed, 297 insertions(+) > create mode 100644 include/sysemu/guest-memfd-manager.h > create mode 100644 system/guest-memfd-manager.c > > diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/guest-memfd-manager.h > new file mode 100644 > index 0000000000..ba4a99b614 > --- /dev/null > +++ b/include/sysemu/guest-memfd-manager.h > @@ -0,0 +1,46 @@ > +/* > + * QEMU guest memfd manager > + * > + * Copyright Intel > + * > + * Author: > + * Chenyi Qiang <chenyi.qiang@xxxxxxxxx> > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory > + * > + */ > + > +#ifndef SYSEMU_GUEST_MEMFD_MANAGER_H > +#define SYSEMU_GUEST_MEMFD_MANAGER_H > + > +#include "sysemu/hostmem.h" > + > +#define TYPE_GUEST_MEMFD_MANAGER "guest-memfd-manager" > + > +OBJECT_DECLARE_TYPE(GuestMemfdManager, GuestMemfdManagerClass, GUEST_MEMFD_MANAGER) > + > +struct GuestMemfdManager { > + Object parent; > + > + /* Managed memory region. */ > + MemoryRegion *mr; > + > + /* > + * 1-setting of the bit represents the memory is populated (shared). > + */ > + int32_t bitmap_size; > + unsigned long *bitmap; > + > + /* block size and alignment */ > + uint64_t block_size; > + > + /* listeners to notify on populate/discard activity. */ > + QLIST_HEAD(, RamDiscardListener) rdl_list; > +}; > + > +struct GuestMemfdManagerClass { > + ObjectClass parent_class; > +}; > + > +#endif > diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c > new file mode 100644 > index 0000000000..d7e105fead > --- /dev/null > +++ b/system/guest-memfd-manager.c > @@ -0,0 +1,250 @@ > +/* > + * QEMU guest memfd manager > + * > + * Copyright Intel > + * > + * Author: > + * Chenyi Qiang <chenyi.qiang@xxxxxxxxx> > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory > + * > + */ > + > +#include "qemu/osdep.h" > +#include "qemu/error-report.h" > +#include "sysemu/guest-memfd-manager.h" > + > +OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager, > + guest_memfd_manager, > + GUEST_MEMFD_MANAGER, > + OBJECT, > + { TYPE_RAM_DISCARD_MANAGER }, > + { }) > + Fixup: Use OBJECT_DEFINE_TYPE_WITH_INTERFACES() instead of OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES() as we define a class struct. diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-manager.c index 50802b34d7..f7dc93071a 100644 --- a/system/guest-memfd-manager.c +++ b/system/guest-memfd-manager.c @@ -15,12 +15,12 @@ #include "qemu/error-report.h" #include "sysemu/guest-memfd-manager.h" -OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(GuestMemfdManager, - guest_memfd_manager, - GUEST_MEMFD_MANAGER, - OBJECT, - { TYPE_RAM_DISCARD_MANAGER }, - { }) +OBJECT_DEFINE_TYPE_WITH_INTERFACES(GuestMemfdManager, + guest_memfd_manager, + GUEST_MEMFD_MANAGER, + OBJECT, + { TYPE_RAM_DISCARD_MANAGER }, + { }) static bool guest_memfd_rdm_is_populated(const RamDiscardManager *rdm,