On 21/1/25 00:06, David Hildenbrand wrote:
On 10.01.25 06:13, Chenyi Qiang wrote:
On 1/9/2025 5:32 PM, Alexey Kardashevskiy wrote:
On 9/1/25 16:34, Chenyi Qiang wrote:
On 1/8/2025 12:47 PM, Alexey Kardashevskiy wrote:
On 13/12/24 18:08, Chenyi Qiang wrote:
Introduce the realize()/unrealize() callbacks to initialize/
uninitialize
the new guest_memfd_manager object and register/unregister it in the
target MemoryRegion.
Guest_memfd was initially set to shared until the commit bd3bcf6962
("kvm/memory: Make memory type private by default if it has guest
memfd
backend"). To align with this change, the default state in
guest_memfd_manager is set to private. (The bitmap is cleared to 0).
Additionally, setting the default to private can also reduce the
overhead of mapping shared pages into IOMMU by VFIO during the bootup
stage.
Signed-off-by: Chenyi Qiang <chenyi.qiang@xxxxxxxxx>
---
include/sysemu/guest-memfd-manager.h | 27 +++++++++++++++++++++++
++++
system/guest-memfd-manager.c | 28 +++++++++++++++++++++++
++++-
system/physmem.c | 7 +++++++
3 files changed, 61 insertions(+), 1 deletion(-)
diff --git a/include/sysemu/guest-memfd-manager.h b/include/sysemu/
guest-memfd-manager.h
index 9dc4e0346d..d1e7f698e8 100644
--- a/include/sysemu/guest-memfd-manager.h
+++ b/include/sysemu/guest-memfd-manager.h
@@ -42,6 +42,8 @@ struct GuestMemfdManager {
struct GuestMemfdManagerClass {
ObjectClass parent_class;
+ void (*realize)(GuestMemfdManager *gmm, MemoryRegion *mr,
uint64_t region_size);
+ void (*unrealize)(GuestMemfdManager *gmm);
int (*state_change)(GuestMemfdManager *gmm, uint64_t offset,
uint64_t size,
bool shared_to_private);
};
@@ -61,4 +63,29 @@ static inline int
guest_memfd_manager_state_change(GuestMemfdManager *gmm, uint6
return 0;
}
+static inline void guest_memfd_manager_realize(GuestMemfdManager
*gmm,
+ MemoryRegion *mr,
uint64_t region_size)
+{
+ GuestMemfdManagerClass *klass;
+
+ g_assert(gmm);
+ klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm);
+
+ if (klass->realize) {
+ klass->realize(gmm, mr, region_size);
Ditch realize() hook and call guest_memfd_manager_realizefn()
directly?
Not clear why these new hooks are needed.
+ }
+}
+
+static inline void guest_memfd_manager_unrealize(GuestMemfdManager
*gmm)
+{
+ GuestMemfdManagerClass *klass;
+
+ g_assert(gmm);
+ klass = GUEST_MEMFD_MANAGER_GET_CLASS(gmm);
+
+ if (klass->unrealize) {
+ klass->unrealize(gmm);
+ }
+}
guest_memfd_manager_unrealizefn()?
Agree. Adding these wrappers seem unnecessary.
+
#endif
diff --git a/system/guest-memfd-manager.c b/system/guest-memfd-
manager.c
index 6601df5f3f..b6a32f0bfb 100644
--- a/system/guest-memfd-manager.c
+++ b/system/guest-memfd-manager.c
@@ -366,6 +366,31 @@ static int
guest_memfd_state_change(GuestMemfdManager *gmm, uint64_t offset,
return ret;
}
+static void guest_memfd_manager_realizefn(GuestMemfdManager
*gmm,
MemoryRegion *mr,
+ uint64_t region_size)
+{
+ uint64_t bitmap_size;
+
+ gmm->block_size = qemu_real_host_page_size();
+ bitmap_size = ROUND_UP(region_size, gmm->block_size) / gmm-
block_size;
imho unaligned region_size should be an assert.
There's no guarantee the region_size of the MemoryRegion is PAGE_SIZE
aligned. So the ROUND_UP() is more appropriate.
It is all about DMA so the smallest you can map is PAGE_SIZE so even if
you round up here, it is likely going to fail to DMA-map later anyway
(or not?).
Checked the handling of VFIO, if the size is less than PAGE_SIZE, it
will just return and won't do DMA-map.
Here is a different thing. It tries to calculate the bitmap_size. The
bitmap is used to track the private/shared status of the page. So if the
size is less than PAGE_SIZE, we still use the one bit to track this
small-size range.
+
+ gmm->mr = mr;
+ gmm->bitmap_size = bitmap_size;
+ gmm->bitmap = bitmap_new(bitmap_size);
+
+ memory_region_set_ram_discard_manager(gmm->mr,
RAM_DISCARD_MANAGER(gmm));
+}
This belongs to 2/7.
+
+static void guest_memfd_manager_unrealizefn(GuestMemfdManager *gmm)
+{
+ memory_region_set_ram_discard_manager(gmm->mr, NULL);
+
+ g_free(gmm->bitmap);
+ gmm->bitmap = NULL;
+ gmm->bitmap_size = 0;
+ gmm->mr = NULL;
@gmm is being destroyed here, why bother zeroing?
OK, will remove it.
+}
+
This function belongs to 2/7.
Will move both realizefn() and unrealizefn().
Yes.
static void guest_memfd_manager_init(Object *obj)
{
GuestMemfdManager *gmm = GUEST_MEMFD_MANAGER(obj);
@@ -375,7 +400,6 @@ static void guest_memfd_manager_init(Object *obj)
static void guest_memfd_manager_finalize(Object *obj)
{
- g_free(GUEST_MEMFD_MANAGER(obj)->bitmap);
}
static void guest_memfd_manager_class_init(ObjectClass *oc,
void
*data)
@@ -384,6 +408,8 @@ static void
guest_memfd_manager_class_init(ObjectClass *oc, void *data)
RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc);
gmmc->state_change = guest_memfd_state_change;
+ gmmc->realize = guest_memfd_manager_realizefn;
+ gmmc->unrealize = guest_memfd_manager_unrealizefn;
rdmc->get_min_granularity =
guest_memfd_rdm_get_min_granularity;
rdmc->register_listener = guest_memfd_rdm_register_listener;
diff --git a/system/physmem.c b/system/physmem.c
index dc1db3a384..532182a6dd 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -53,6 +53,7 @@
#include "sysemu/hostmem.h"
#include "sysemu/hw_accel.h"
#include "sysemu/xen-mapcache.h"
+#include "sysemu/guest-memfd-manager.h"
#include "trace.h"
#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
@@ -1885,6 +1886,9 @@ static void ram_block_add(RAMBlock *new_block,
Error **errp)
qemu_mutex_unlock_ramlist();
goto out_free;
}
+
+ GuestMemfdManager *gmm =
GUEST_MEMFD_MANAGER(object_new(TYPE_GUEST_MEMFD_MANAGER));
+ guest_memfd_manager_realize(gmm, new_block->mr, new_block-
mr->size);
Wow. Quite invasive.
Yeah... It creates a manager object no matter whether the user wants to
us e shared passthru or not. We assume some fields like
private/shared
bitmap may also be helpful in other scenario for future usage, and
if no
passthru device, the listener would just return, so it is acceptable.
Explain these other scenarios in the commit log please as otherwise
making this an interface of HostMemoryBackendMemfd looks way cleaner.
Thanks,
Thanks for the suggestion. Until now, I think making this an interface
of HostMemoryBackend is cleaner. The potential future usage for
non-HostMemoryBackend guest_memfd-backed memory region I can think of is
the the TEE I/O for iommufd P2P support? when it tries to initialize RAM
device memory region with the attribute of shared/private. But I think
it would be a long term story and we are not sure what it will be like
in future.
As raised in #2, I'm don't think this belongs into HostMemoryBackend. It
kind-of belongs to the RAMBlock, but we could have another object
(similar to virtio-mem currently managing a single
HostMemoryBackend->RAMBlock) that takes care of that for multiple memory
backends.
The vBIOS thingy confused me and then I confused others :) There are 2
things:
1) an interface or new subclass of HostMemoryBackendClass which we need
to advertise and implement ability to discard pages;
2) RamDiscardManagerClass which is MR/Ramblock and does not really
belong to HostMemoryBackend (as it is in what was posted ages ago).
I suggest Chenyi post a new version using the current approach with the
comments and commitlogs fixed. Makes sense? Thanks,
--
Alexey