On 15/1/25 17:15, Chenyi Qiang wrote:
On 1/15/2025 12:06 PM, Alexey Kardashevskiy wrote:
On 10/1/25 17:38, Chenyi Qiang wrote:
On 1/10/2025 8:58 AM, Alexey Kardashevskiy wrote:
On 9/1/25 15:29, Chenyi Qiang wrote:
On 1/9/2025 10:55 AM, Alexey Kardashevskiy wrote:
On 9/1/25 13:11, Chenyi Qiang wrote:
On 1/8/2025 7:20 PM, Alexey Kardashevskiy wrote:
On 8/1/25 21:56, Chenyi Qiang wrote:
On 1/8/2025 12:48 PM, Alexey Kardashevskiy wrote:
On 13/12/24 18:08, Chenyi Qiang wrote:
As the commit 852f0048f3 ("RAMBlock: make guest_memfd require
uncoordinated discard") highlighted, some subsystems like VFIO
might
disable ram block discard. However, guest_memfd relies on the
discard
operation to perform page conversion between private and shared
memory.
This can lead to stale IOMMU mapping issue when assigning a
hardware
device to a confidential VM via shared memory (unprotected memory
pages). Blocking shared page discard can solve this problem,
but it
could cause guests to consume twice the memory with VFIO,
which is
not
acceptable in some cases. An alternative solution is to convey
other
systems like VFIO to refresh its outdated IOMMU mappings.
RamDiscardManager is an existing concept (used by virtio-mem) to
adjust
VFIO mappings in relation to VM page assignment. Effectively page
conversion is similar to hot-removing a page in one mode and
adding it
back in the other, so the similar work that needs to happen in
response
to virtio-mem changes needs to happen for page conversion events.
Introduce the RamDiscardManager to guest_memfd to achieve it.
However, guest_memfd is not an object so it cannot directly
implement
the RamDiscardManager interface.
One solution is to implement the interface in HostMemoryBackend.
Any
This sounds about right.
btw I am using this for ages:
https://github.com/aik/qemu/
commit/3663f889883d4aebbeb0e4422f7be5e357e2ee46
but I am not sure if this ever saw the light of the day, did not it?
(ironically I am using it as a base for encrypted DMA :) )
Yeah, we are doing the same work. I saw a solution from Michael long
time ago (when there was still
a dedicated hostmem-memfd-private backend for restrictedmem/gmem)
(https://github.com/AMDESE/qemu/
commit/3bf5255fc48d648724d66410485081ace41d8ee6)
For your patch, it only implement the interface for
HostMemoryBackendMemfd. Maybe it is more appropriate to implement it for
the parent object HostMemoryBackend, because besides the
MEMORY_BACKEND_MEMFD, other backend types like MEMORY_BACKEND_RAM and
MEMORY_BACKEND_FILE can also be guest_memfd-backed.
Think more about where to implement this interface. It is still
uncertain to me. As I mentioned in another mail, maybe ram device memory
region would be backed by guest_memfd if we support TEE IO iommufd MMIO
in future. Then a specific object is more appropriate. What's your
opinion?
I do not know about this. Unlike RAM, MMIO can only do "in-place
conversion" and the interface to do so is not straight forward and VFIO
owns MMIO anyway so the uAPI will be in iommufd, here is a gist of it:
https://github.com/aik/linux/
commit/89e45c0404fa5006b2a4de33a4d582adf1ba9831
"guest request" is a communication channel from the VM to the secure FW
(AMD's "PSP") to make MMIO allow encrypted access.
It is still uncertain how to implement the private MMIO. Our assumption
is the private MMIO would also create a memory region with
guest_memfd-like backend. Its mr->ram is true and should be managed by
RamdDiscardManager which can skip doing DMA_MAP in VFIO's region_add
listener.
My current working approach is to leave it as is in QEMU and VFIO. And
then avoid page state changes in KVM when private MMIO fault happens,
and treat it just like normal MMIO.
And iommufd does not allow mapping VFIO MMIO anyway, I am getting:
qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
I am really hoping for in-place memory conversion to become available
sooner than later :)
guest_memfd-backed host memory backend can register itself in the
target
MemoryRegion. However, this solution doesn't cover the scenario
where a
guest_memfd MemoryRegion doesn't belong to the HostMemoryBackend,
e.g.
the virtual BIOS MemoryRegion.
What is this virtual BIOS MemoryRegion exactly? What does it look
like
in "info mtree -f"? Do we really want this memory to be DMAable?
virtual BIOS shows in a separate region:
Root memory region: system
0000000000000000-000000007fffffff (prio 0, ram): pc.ram KVM
...
00000000ffc00000-00000000ffffffff (prio 0, ram): pc.bios KVM
Looks like a normal MR which can be backed by guest_memfd.
Yes, virtual BIOS memory region is initialized by
memory_region_init_ram_guest_memfd() which will be backed by a
guest_memfd.
The tricky thing is, for Intel TDX (not sure about AMD SEV), the
virtual
BIOS image will be loaded and then copied to private region.
After that,
the loaded image will be discarded and this region become useless.
I'd think it is loaded as "struct Rom" and then copied to the MR-
ram_guest_memfd() which does not leave MR useless - we still see
"pc.bios" in the list so it is not discarded. What piece of code
are you
referring to exactly?
Sorry for confusion, maybe it is different between TDX and SEV-SNP for
the vBIOS handling.
In x86_bios_rom_init(), it initializes a guest_memfd-backed MR and
loads
the vBIOS image to the shared part of the guest_memfd MR.
For TDX, it
will copy the image to private region (not the vBIOS guest_memfd MR
private part) and discard the shared part. So, although the memory
region still exists, it seems useless.
It is different for SEV-SNP, correct? Does SEV-SNP manage the vBIOS in
vBIOS guest_memfd private memory?
This is what it looks like on my SNP VM (which, I suspect, is the same
as yours as hw/i386/pc.c does not distinguish Intel/AMD for this
matter):
Yes, the memory region object is created on both TDX and SEV-SNP.
Root memory region: system
0000000000000000-00000000000bffff (prio 0, ram): ram1 KVM gmemfd=20
00000000000c0000-00000000000dffff (prio 1, ram): pc.rom KVM gmemfd=27
00000000000e0000-000000001fffffff (prio 0, ram): ram1
@00000000000e0000 KVM gmemfd=20
...
00000000ffc00000-00000000ffffffff (prio 0, ram): pc.bios KVM
gmemfd=26
So the pc.bios MR exists and in use (hence its appearance in "info mtree
-f").
I added the gmemfd dumping:
--- a/system/memory.c
+++ b/system/memory.c
@@ -3446,6 +3446,9 @@ static void mtree_print_flatview(gpointer key,
gpointer value,
}
}
}
+ if (mr->ram_block && mr->ram_block->guest_memfd >= 0) {
+ qemu_printf(" gmemfd=%d", mr->ram_block->guest_memfd);
+ }
Then I think the virtual BIOS is another case not belonging to
HostMemoryBackend which convince us to implement the interface in a
specific object, no?
TBH I have no idea why pc.rom and pc.bios are separate memory regions
but in any case why do these 2 areas need to be treated any different
than the rest of RAM? Thanks,
I think no difference. That's why I suggest implementing the RDM
interface in a specific object to cover both instead of the only
HostMemoryBackend.
I am still confused. Sounds like nothing prevents doing it either way,
just a matter of taste, is that right? Thanks,
--
Alexey