On Tue, Nov 7, 2023 at 2:29 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > On Tue, Nov 07, 2023 at 05:25:06PM +0100, Paolo Bonzini wrote: > > On 11/6/23 21:23, Peter Xu wrote: > > > On Mon, Nov 06, 2023 at 10:25:13AM -0800, David Matlack wrote: > > > > > > > Once you have the implementation done for guest_memfd, it is interesting to > > see how easily it extends to other, userspace-mappable kinds of memory. But > > I still dislike the fact that you need some kind of extra protocol in > > userspace, for multi-process VMMs. This is the kind of thing that the > > kernel is supposed to facilitate. I'd like it to do _more_ of that (see > > above memfd pseudo-suggestion), not less. > > Is that our future plan to extend gmemfd to normal memories? > > I see that gmemfd manages folio on its own. I think it'll make perfect > sense if it's for use in CoCo context, where the memory is so special to be > generic anyway. > > However if to extend it to generic memories, I'm wondering how do we > support existing memory features of such memory which already exist with > KVM_SET_USER_MEMORY_REGION v1. To name some: > > - numa awareness > - swapping > - cgroup > - punch hole (in a huge page, aka, thp split) > - cma allocations for huge pages / page migrations > - ... Sean has stated that he doesn't want guest_memfd to support swap. So I don't think guest_memfd will one day replace all guest memory use-cases. That also means that my idea to extend my proposal to guest_memfd VMAs has limited value. VMs that do not use guest_memfd would not be able to use it. Paolo, it sounds like overall my proposal has limited value outside of GCE's use-case. And even if it landed upstream, it would bifrucate KVM VM post-copy support. So I think it's probably not worth pursuing further. Do you think that's a fair assessment? Getting a clear NACK on pushing this proposal upstream would be a nice outcome here since it helps inform our next steps. That being said, we still don't have an upstream solution for 1G post-copy, which James pointed out is really the core issue. But there are other avenues we can explore in that direction such as cleaning up HugeTLB (very nebulous) or adding 1G+mmap()+userfaultfd support to guest_memfd. The latter seems promising.