On 1/15/2025 12:06 PM, Alexey Kardashevskiy wrote: > On 10/1/25 17:38, Chenyi Qiang wrote: >> >> >> On 1/10/2025 8:58 AM, Alexey Kardashevskiy wrote: >>> >>> >>> On 9/1/25 15:29, Chenyi Qiang wrote: >>>> >>>> >>>> On 1/9/2025 10:55 AM, Alexey Kardashevskiy wrote: >>>>> >>>>> >>>>> On 9/1/25 13:11, Chenyi Qiang wrote: >>>>>> >>>>>> >>>>>> On 1/8/2025 7:20 PM, Alexey Kardashevskiy wrote: >>>>>>> >>>>>>> >>>>>>> On 8/1/25 21:56, Chenyi Qiang wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 1/8/2025 12:48 PM, Alexey Kardashevskiy wrote: >>>>>>>>> On 13/12/24 18:08, Chenyi Qiang wrote: >>>>>>>>>> As the commit 852f0048f3 ("RAMBlock: make guest_memfd require >>>>>>>>>> uncoordinated discard") highlighted, some subsystems like VFIO >>>>>>>>>> might >>>>>>>>>> disable ram block discard. However, guest_memfd relies on the >>>>>>>>>> discard >>>>>>>>>> operation to perform page conversion between private and shared >>>>>>>>>> memory. >>>>>>>>>> This can lead to stale IOMMU mapping issue when assigning a >>>>>>>>>> hardware >>>>>>>>>> device to a confidential VM via shared memory (unprotected memory >>>>>>>>>> pages). Blocking shared page discard can solve this problem, >>>>>>>>>> but it >>>>>>>>>> could cause guests to consume twice the memory with VFIO, >>>>>>>>>> which is >>>>>>>>>> not >>>>>>>>>> acceptable in some cases. An alternative solution is to convey >>>>>>>>>> other >>>>>>>>>> systems like VFIO to refresh its outdated IOMMU mappings. >>>>>>>>>> >>>>>>>>>> RamDiscardManager is an existing concept (used by virtio-mem) to >>>>>>>>>> adjust >>>>>>>>>> VFIO mappings in relation to VM page assignment. Effectively page >>>>>>>>>> conversion is similar to hot-removing a page in one mode and >>>>>>>>>> adding it >>>>>>>>>> back in the other, so the similar work that needs to happen in >>>>>>>>>> response >>>>>>>>>> to virtio-mem changes needs to happen for page conversion events. >>>>>>>>>> Introduce the RamDiscardManager to guest_memfd to achieve it. >>>>>>>>>> >>>>>>>>>> However, guest_memfd is not an object so it cannot directly >>>>>>>>>> implement >>>>>>>>>> the RamDiscardManager interface. >>>>>>>>>> >>>>>>>>>> One solution is to implement the interface in HostMemoryBackend. >>>>>>>>>> Any >>>>>>>>> >>>>>>>>> This sounds about right. >>> >>> btw I am using this for ages: >>> >>> https://github.com/aik/qemu/ >>> commit/3663f889883d4aebbeb0e4422f7be5e357e2ee46 >>> >>> but I am not sure if this ever saw the light of the day, did not it? >>> (ironically I am using it as a base for encrypted DMA :) ) >> >> Yeah, we are doing the same work. I saw a solution from Michael long >> time ago (when there was still >> a dedicated hostmem-memfd-private backend for restrictedmem/gmem) >> (https://github.com/AMDESE/qemu/ >> commit/3bf5255fc48d648724d66410485081ace41d8ee6) >> >> For your patch, it only implement the interface for >> HostMemoryBackendMemfd. Maybe it is more appropriate to implement it for >> the parent object HostMemoryBackend, because besides the >> MEMORY_BACKEND_MEMFD, other backend types like MEMORY_BACKEND_RAM and >> MEMORY_BACKEND_FILE can also be guest_memfd-backed. >> >> Think more about where to implement this interface. It is still >> uncertain to me. As I mentioned in another mail, maybe ram device memory >> region would be backed by guest_memfd if we support TEE IO iommufd MMIO >> in future. Then a specific object is more appropriate. What's your >> opinion? > > I do not know about this. Unlike RAM, MMIO can only do "in-place > conversion" and the interface to do so is not straight forward and VFIO > owns MMIO anyway so the uAPI will be in iommufd, here is a gist of it: > > https://github.com/aik/linux/ > commit/89e45c0404fa5006b2a4de33a4d582adf1ba9831 > > "guest request" is a communication channel from the VM to the secure FW > (AMD's "PSP") to make MMIO allow encrypted access. It is still uncertain how to implement the private MMIO. Our assumption is the private MMIO would also create a memory region with guest_memfd-like backend. Its mr->ram is true and should be managed by RamdDiscardManager which can skip doing DMA_MAP in VFIO's region_add listener. > > >>> >>>>>>>>> >>>>>>>>>> guest_memfd-backed host memory backend can register itself in the >>>>>>>>>> target >>>>>>>>>> MemoryRegion. However, this solution doesn't cover the scenario >>>>>>>>>> where a >>>>>>>>>> guest_memfd MemoryRegion doesn't belong to the HostMemoryBackend, >>>>>>>>>> e.g. >>>>>>>>>> the virtual BIOS MemoryRegion. >>>>>>>>> >>>>>>>>> What is this virtual BIOS MemoryRegion exactly? What does it look >>>>>>>>> like >>>>>>>>> in "info mtree -f"? Do we really want this memory to be DMAable? >>>>>>>> >>>>>>>> virtual BIOS shows in a separate region: >>>>>>>> >>>>>>>> Root memory region: system >>>>>>>> 0000000000000000-000000007fffffff (prio 0, ram): pc.ram KVM >>>>>>>> ... >>>>>>>> 00000000ffc00000-00000000ffffffff (prio 0, ram): pc.bios KVM >>>>>>> >>>>>>> Looks like a normal MR which can be backed by guest_memfd. >>>>>> >>>>>> Yes, virtual BIOS memory region is initialized by >>>>>> memory_region_init_ram_guest_memfd() which will be backed by a >>>>>> guest_memfd. >>>>>> >>>>>> The tricky thing is, for Intel TDX (not sure about AMD SEV), the >>>>>> virtual >>>>>> BIOS image will be loaded and then copied to private region. >>>>>> After that, >>>>>> the loaded image will be discarded and this region become useless. >>>>> >>>>> I'd think it is loaded as "struct Rom" and then copied to the MR- >>>>> ram_guest_memfd() which does not leave MR useless - we still see >>>>> "pc.bios" in the list so it is not discarded. What piece of code >>>>> are you >>>>> referring to exactly? >>>> >>>> Sorry for confusion, maybe it is different between TDX and SEV-SNP for >>>> the vBIOS handling. >>>> >>>> In x86_bios_rom_init(), it initializes a guest_memfd-backed MR and >>>> loads >>>> the vBIOS image to the shared part of the guest_memfd MR. >>>> For TDX, it >>>> will copy the image to private region (not the vBIOS guest_memfd MR >>>> private part) and discard the shared part. So, although the memory >>>> region still exists, it seems useless. >>>> It is different for SEV-SNP, correct? Does SEV-SNP manage the vBIOS in >>>> vBIOS guest_memfd private memory? >>> >>> This is what it looks like on my SNP VM (which, I suspect, is the same >>> as yours as hw/i386/pc.c does not distinguish Intel/AMD for this >>> matter): >> >> Yes, the memory region object is created on both TDX and SEV-SNP. >> >>> >>> Root memory region: system >>> 0000000000000000-00000000000bffff (prio 0, ram): ram1 KVM gmemfd=20 >>> 00000000000c0000-00000000000dffff (prio 1, ram): pc.rom KVM gmemfd=27 >>> 00000000000e0000-000000001fffffff (prio 0, ram): ram1 >>> @00000000000e0000 KVM gmemfd=20 >>> ... >>> 00000000ffc00000-00000000ffffffff (prio 0, ram): pc.bios KVM >>> gmemfd=26 >>> >>> So the pc.bios MR exists and in use (hence its appearance in "info mtree >>> -f"). >>> >>> >>> I added the gmemfd dumping: >>> >>> --- a/system/memory.c >>> +++ b/system/memory.c >>> @@ -3446,6 +3446,9 @@ static void mtree_print_flatview(gpointer key, >>> gpointer value, >>> } >>> } >>> } >>> + if (mr->ram_block && mr->ram_block->guest_memfd >= 0) { >>> + qemu_printf(" gmemfd=%d", mr->ram_block->guest_memfd); >>> + } >>> >> >> Then I think the virtual BIOS is another case not belonging to >> HostMemoryBackend which convince us to implement the interface in a >> specific object, no? > > TBH I have no idea why pc.rom and pc.bios are separate memory regions > but in any case why do these 2 areas need to be treated any different > than the rest of RAM? Thanks, I think no difference. That's why I suggest implementing the RDM interface in a specific object to cover both instead of the only HostMemoryBackend. > >