On Thu, Mar 13, 2025 at 10:13:23PM +0000, Nikita Kalyazin wrote: > Yes, that's right, mmap() + memcpy() is functionally sufficient. write() is > an optimisation. Most of the pages in guest_memfd are only ever accessed by > the vCPU (not userspace) via TDP (stage-2 pagetables) so they don't need > userspace pagetables set up. By using write() we can avoid VMA faults, > installing corresponding PTEs and double page initialisation we discussed > earlier. The optimised path only contains pagecache population via write(). > Even TDP faults can be avoided if using KVM prefaulting API [1]. > > [1] https://docs.kernel.org/virt/kvm/api.html#kvm-pre-fault-memory Could you elaborate why VMA faults matters in perf? If we're talking about postcopy-like migrations on top of KVM guest-memfd, IIUC the VMAs can be pre-faulted too just like the TDP pgtables, e.g. with MADV_POPULATE_WRITE. Normally, AFAIU userapp optimizes IOs the other way round.. to change write()s into mmap()s, which at least avoids one round of copy. For postcopy using minor traps (and since guest-memfd is always shared and non-private..), it's also possible to feed the mmap()ed VAs to NIC as buffers (e.g. in recvmsg(), for example, as part of iovec[]), and as long as the mmap()ed ranges are not registered by KVM memslots, there's no concern on non-atomic copy. Thanks, -- Peter Xu