On 3/14/2025 8:11 PM, Gupta, Pankaj wrote: > On 3/10/2025 9:18 AM, Chenyi Qiang wrote: >> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >> uncoordinated discard") highlighted, some subsystems like VFIO may >> disable ram block discard. However, guest_memfd relies on the discard >> operation to perform page conversion between private and shared memory. >> This can lead to stale IOMMU mapping issue when assigning a hardware >> device to a confidential VM via shared memory. To address this, it is >> crucial to ensure systems like VFIO refresh its IOMMU mappings. >> >> RamDiscardManager is an existing concept (used by virtio-mem) to adjust >> VFIO mappings in relation to VM page assignment. Effectively page >> conversion is similar to hot-removing a page in one mode and adding it >> back in the other. Therefore, similar actions are required for page >> conversion events. Introduce the RamDiscardManager to guest_memfd to >> facilitate this process. >> >> Since guest_memfd is not an object, it cannot directly implement the >> RamDiscardManager interface. One potential attempt is to implement it in >> HostMemoryBackend. This is not appropriate because guest_memfd is per >> RAMBlock. Some RAMBlocks have a memory backend but others do not. In >> particular, the ones like virtual BIOS calling >> memory_region_init_ram_guest_memfd() do not. >> >> To manage the RAMBlocks with guest_memfd, define a new object named >> MemoryAttributeManager to implement the RamDiscardManager interface. The > > Isn't this should be the other way around. 'MemoryAttributeManager' > should be an interface and RamDiscardManager a type of it, an > implementation? We want to use 'MemoryAttributeManager' to represent RAMBlock to implement the RamDiscardManager interface callbacks because RAMBlock is not an object. It includes some metadata of guest_memfd like shared_bitmap at the same time. I can't get it that make 'MemoryAttributeManager' an interface and RamDiscardManager a type of it. Can you elaborate it a little bit? I think at least we need someone to implement the RamDiscardManager interface. > > MemoryAttributeManager have the data like 'shared_bitmap' etc that > information can also be used by the other implementation types? Shared_bitmap is just the metadata of guest_memfd. Other subsystems may consult its status by RamDiscardManager interface like ram_discard_manager_is_populated(). > > Or maybe I am getting it entirely wrong. > > Thanks, > Pankaj > >> object stores guest_memfd information such as shared_bitmap, and handles >> page conversion notification. Using the name of MemoryAttributeManager is >> aimed to make it more generic. The term "Memory" emcompasses not only RAM >> but also private MMIO in TEE I/O, which might rely on this >> object/interface to handle page conversion events in the future. The >> term "Attribute" allows for the management of various attributes beyond >> shared and private. For instance, it could support scenarios where >> discard vs. populated and shared vs. private states co-exists, such as >> supporting virtio-mem or something similar in the future. >> >> In the current context, MemoryAttributeManager signifies discarded state >> as private and populated state as shared. Memory state is tracked at the >> host page size granularity, as the minimum memory conversion size can >> be one >> page per request. Additionally, VFIO expects the DMA mapping for a >> specific iova to be mapped and unmapped with the same granularity. >> Confidential VMs may perform partial conversions, e.g. conversion >> happens on a small region within a large region. To prevent such invalid >> cases and until cut_mapping operation support is introduced, all >> operations are performed with 4K granularity. >> >> Signed-off-by: Chenyi Qiang <chenyi.qiang@xxxxxxxxx> >> --- >> Changes in v3: >> - Some rename (bitmap_size->shared_bitmap_size, >> first_one/zero_bit->first_bit, etc.) >> - Change shared_bitmap_size from uint32_t to unsigned >> - Return mgr->mr->ram_block->page_size in get_block_size() >> - Move set_ram_discard_manager() up to avoid a g_free() in failure >> case. >> - Add const for the memory_attribute_manager_get_block_size() >> - Unify the ReplayRamPopulate and ReplayRamDiscard and related >> callback. >> >> Changes in v2: >> - Rename the object name to MemoryAttributeManager >> - Rename the bitmap to shared_bitmap to make it more clear. >> - Remove block_size field and get it from a helper. In future, we >> can get the page_size from RAMBlock if necessary. >> - Remove the unncessary "struct" before GuestMemfdReplayData >> - Remove the unncessary g_free() for the bitmap >> - Add some error report when the callback failure for >> populated/discarded section. >> - Move the realize()/unrealize() definition to this patch. >> --- >> include/system/memory-attribute-manager.h | 42 ++++ >> system/memory-attribute-manager.c | 283 ++++++++++++++++++++++ >> system/meson.build | 1 + >> 3 files changed, 326 insertions(+) >> create mode 100644 include/system/memory-attribute-manager.h >> create mode 100644 system/memory-attribute-manager.c >> >> diff --git a/include/system/memory-attribute-manager.h b/include/ >> system/memory-attribute-manager.h >> new file mode 100644 >> index 0000000000..23375a14b8 >> --- /dev/null >> +++ b/include/system/memory-attribute-manager.h >> @@ -0,0 +1,42 @@ >> +/* >> + * QEMU memory attribute manager >> + * >> + * Copyright Intel >> + * >> + * Author: >> + * Chenyi Qiang <chenyi.qiang@xxxxxxxxx> >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or >> later. >> + * See the COPYING file in the top-level directory >> + * >> + */ >> + >> +#ifndef SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H >> +#define SYSTEM_MEMORY_ATTRIBUTE_MANAGER_H >> + >> +#include "system/hostmem.h" >> + >> +#define TYPE_MEMORY_ATTRIBUTE_MANAGER "memory-attribute-manager" >> + >> +OBJECT_DECLARE_TYPE(MemoryAttributeManager, >> MemoryAttributeManagerClass, MEMORY_ATTRIBUTE_MANAGER) >> + >> +struct MemoryAttributeManager { >> + Object parent; >> + >> + MemoryRegion *mr; >> + >> + /* 1-setting of the bit represents the memory is populated >> (shared) */ >> + unsigned shared_bitmap_size; >> + unsigned long *shared_bitmap; >> + >> + QLIST_HEAD(, RamDiscardListener) rdl_list; >> +}; >> + >> +struct MemoryAttributeManagerClass { >> + ObjectClass parent_class; >> +}; >> + >> +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, >> MemoryRegion *mr); >> +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr); >> + >> +#endif >> diff --git a/system/memory-attribute-manager.c b/system/memory- >> attribute-manager.c >> new file mode 100644 >> index 0000000000..7c3789cf49 >> --- /dev/null >> +++ b/system/memory-attribute-manager.c >> @@ -0,0 +1,283 @@ >> +/* >> + * QEMU memory attribute manager >> + * >> + * Copyright Intel >> + * >> + * Author: >> + * Chenyi Qiang <chenyi.qiang@xxxxxxxxx> >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or >> later. >> + * See the COPYING file in the top-level directory >> + * >> + */ >> + >> +#include "qemu/osdep.h" >> +#include "qemu/error-report.h" >> +#include "exec/ramblock.h" >> +#include "system/memory-attribute-manager.h" >> + >> +OBJECT_DEFINE_TYPE_WITH_INTERFACES(MemoryAttributeManager, >> + memory_attribute_manager, >> + MEMORY_ATTRIBUTE_MANAGER, >> + OBJECT, >> + { TYPE_RAM_DISCARD_MANAGER }, >> + { }) >> + >> +static size_t memory_attribute_manager_get_block_size(const >> MemoryAttributeManager *mgr) >> +{ >> + /* >> + * Because page conversion could be manipulated in the size of at >> least 4K or 4K aligned, >> + * Use the host page size as the granularity to track the memory >> attribute. >> + */ >> + g_assert(mgr && mgr->mr && mgr->mr->ram_block); >> + g_assert(mgr->mr->ram_block->page_size == >> qemu_real_host_page_size()); >> + return mgr->mr->ram_block->page_size; >> +} >> + >> + >> +static bool memory_attribute_rdm_is_populated(const RamDiscardManager >> *rdm, >> + const >> MemoryRegionSection *section) >> +{ >> + const MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + const int block_size = memory_attribute_manager_get_block_size(mgr); >> + uint64_t first_bit = section->offset_within_region / block_size; >> + uint64_t last_bit = first_bit + int128_get64(section->size) / >> block_size - 1; >> + unsigned long first_discard_bit; >> + >> + first_discard_bit = find_next_zero_bit(mgr->shared_bitmap, >> last_bit + 1, first_bit); >> + return first_discard_bit > last_bit; >> +} >> + >> +typedef int (*memory_attribute_section_cb)(MemoryRegionSection *s, >> void *arg); >> + >> +static int memory_attribute_notify_populate_cb(MemoryRegionSection >> *section, void *arg) >> +{ >> + RamDiscardListener *rdl = arg; >> + >> + return rdl->notify_populate(rdl, section); >> +} >> + >> +static int memory_attribute_notify_discard_cb(MemoryRegionSection >> *section, void *arg) >> +{ >> + RamDiscardListener *rdl = arg; >> + >> + rdl->notify_discard(rdl, section); >> + >> + return 0; >> +} >> + >> +static int memory_attribute_for_each_populated_section(const >> MemoryAttributeManager *mgr, >> + >> MemoryRegionSection *section, >> + void *arg, >> + >> memory_attribute_section_cb cb) >> +{ >> + unsigned long first_bit, last_bit; >> + uint64_t offset, size; >> + const int block_size = memory_attribute_manager_get_block_size(mgr); >> + int ret = 0; >> + >> + first_bit = section->offset_within_region / block_size; >> + first_bit = find_next_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, first_bit); >> + >> + while (first_bit < mgr->shared_bitmap_size) { >> + MemoryRegionSection tmp = *section; >> + >> + offset = first_bit * block_size; >> + last_bit = find_next_zero_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, >> + first_bit + 1) - 1; >> + size = (last_bit - first_bit + 1) * block_size; >> + >> + if (!memory_region_section_intersect_range(&tmp, offset, >> size)) { >> + break; >> + } >> + >> + ret = cb(&tmp, arg); >> + if (ret) { >> + error_report("%s: Failed to notify RAM discard listener: >> %s", __func__, >> + strerror(-ret)); >> + break; >> + } >> + >> + first_bit = find_next_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, >> + last_bit + 2); >> + } >> + >> + return ret; >> +} >> + >> +static int memory_attribute_for_each_discarded_section(const >> MemoryAttributeManager *mgr, >> + >> MemoryRegionSection *section, >> + void *arg, >> + >> memory_attribute_section_cb cb) >> +{ >> + unsigned long first_bit, last_bit; >> + uint64_t offset, size; >> + const int block_size = memory_attribute_manager_get_block_size(mgr); >> + int ret = 0; >> + >> + first_bit = section->offset_within_region / block_size; >> + first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, >> + first_bit); >> + >> + while (first_bit < mgr->shared_bitmap_size) { >> + MemoryRegionSection tmp = *section; >> + >> + offset = first_bit * block_size; >> + last_bit = find_next_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, >> + first_bit + 1) - 1; >> + size = (last_bit - first_bit + 1) * block_size; >> + >> + if (!memory_region_section_intersect_range(&tmp, offset, >> size)) { >> + break; >> + } >> + >> + ret = cb(&tmp, arg); >> + if (ret) { >> + error_report("%s: Failed to notify RAM discard listener: >> %s", __func__, >> + strerror(-ret)); >> + break; >> + } >> + >> + first_bit = find_next_zero_bit(mgr->shared_bitmap, mgr- >> >shared_bitmap_size, >> + last_bit + 2); >> + } >> + >> + return ret; >> +} >> + >> +static uint64_t memory_attribute_rdm_get_min_granularity(const >> RamDiscardManager *rdm, >> + const >> MemoryRegion *mr) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + >> + g_assert(mr == mgr->mr); >> + return memory_attribute_manager_get_block_size(mgr); >> +} >> + >> +static void memory_attribute_rdm_register_listener(RamDiscardManager >> *rdm, >> + RamDiscardListener >> *rdl, >> + >> MemoryRegionSection *section) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + int ret; >> + >> + g_assert(section->mr == mgr->mr); >> + rdl->section = memory_region_section_new_copy(section); >> + >> + QLIST_INSERT_HEAD(&mgr->rdl_list, rdl, next); >> + >> + ret = memory_attribute_for_each_populated_section(mgr, section, rdl, >> + >> memory_attribute_notify_populate_cb); >> + if (ret) { >> + error_report("%s: Failed to register RAM discard listener: >> %s", __func__, >> + strerror(-ret)); >> + } >> +} >> + >> +static void >> memory_attribute_rdm_unregister_listener(RamDiscardManager *rdm, >> + >> RamDiscardListener *rdl) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + int ret; >> + >> + g_assert(rdl->section); >> + g_assert(rdl->section->mr == mgr->mr); >> + >> + ret = memory_attribute_for_each_populated_section(mgr, rdl- >> >section, rdl, >> + >> memory_attribute_notify_discard_cb); >> + if (ret) { >> + error_report("%s: Failed to unregister RAM discard listener: >> %s", __func__, >> + strerror(-ret)); >> + } >> + >> + memory_region_section_free_copy(rdl->section); >> + rdl->section = NULL; >> + QLIST_REMOVE(rdl, next); >> + >> +} >> + >> +typedef struct MemoryAttributeReplayData { >> + ReplayRamStateChange fn; >> + void *opaque; >> +} MemoryAttributeReplayData; >> + >> +static int memory_attribute_rdm_replay_cb(MemoryRegionSection >> *section, void *arg) >> +{ >> + MemoryAttributeReplayData *data = arg; >> + >> + return data->fn(section, data->opaque); >> +} >> + >> +static int memory_attribute_rdm_replay_populated(const >> RamDiscardManager *rdm, >> + MemoryRegionSection >> *section, >> + ReplayRamStateChange >> replay_fn, >> + void *opaque) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = >> opaque }; >> + >> + g_assert(section->mr == mgr->mr); >> + return memory_attribute_for_each_populated_section(mgr, section, >> &data, >> + >> memory_attribute_rdm_replay_cb); >> +} >> + >> +static int memory_attribute_rdm_replay_discarded(const >> RamDiscardManager *rdm, >> + MemoryRegionSection >> *section, >> + ReplayRamStateChange >> replay_fn, >> + void *opaque) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(rdm); >> + MemoryAttributeReplayData data = { .fn = replay_fn, .opaque = >> opaque }; >> + >> + g_assert(section->mr == mgr->mr); >> + return memory_attribute_for_each_discarded_section(mgr, section, >> &data, >> + >> memory_attribute_rdm_replay_cb); >> +} >> + >> +int memory_attribute_manager_realize(MemoryAttributeManager *mgr, >> MemoryRegion *mr) >> +{ >> + uint64_t shared_bitmap_size; >> + const int block_size = qemu_real_host_page_size(); >> + int ret; >> + >> + shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size; >> + >> + mgr->mr = mr; >> + ret = memory_region_set_ram_discard_manager(mgr->mr, >> RAM_DISCARD_MANAGER(mgr)); >> + if (ret) { >> + return ret; >> + } >> + mgr->shared_bitmap_size = shared_bitmap_size; >> + mgr->shared_bitmap = bitmap_new(shared_bitmap_size); >> + >> + return ret; >> +} >> + >> +void memory_attribute_manager_unrealize(MemoryAttributeManager *mgr) >> +{ >> + g_free(mgr->shared_bitmap); >> + memory_region_set_ram_discard_manager(mgr->mr, NULL); >> +} >> + >> +static void memory_attribute_manager_init(Object *obj) >> +{ >> + MemoryAttributeManager *mgr = MEMORY_ATTRIBUTE_MANAGER(obj); >> + >> + QLIST_INIT(&mgr->rdl_list); >> +} >> + >> +static void memory_attribute_manager_finalize(Object *obj) >> +{ >> +} >> + >> +static void memory_attribute_manager_class_init(ObjectClass *oc, void >> *data) >> +{ >> + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(oc); >> + >> + rdmc->get_min_granularity = >> memory_attribute_rdm_get_min_granularity; >> + rdmc->register_listener = memory_attribute_rdm_register_listener; >> + rdmc->unregister_listener = >> memory_attribute_rdm_unregister_listener; >> + rdmc->is_populated = memory_attribute_rdm_is_populated; >> + rdmc->replay_populated = memory_attribute_rdm_replay_populated; >> + rdmc->replay_discarded = memory_attribute_rdm_replay_discarded; >> +} > > Would this initialization be for >> diff --git a/system/meson.build b/system/meson.build >> index 4952f4b2c7..ab07ff1442 100644 >> --- a/system/meson.build >> +++ b/system/meson.build >> @@ -15,6 +15,7 @@ system_ss.add(files( >> 'dirtylimit.c', >> 'dma-helpers.c', >> 'globals.c', >> + 'memory-attribute-manager.c', >> 'memory_mapping.c', >> 'qdev-monitor.c', >> 'qtest.c', >