On Thu, Nov 9, 2023 at 9:58 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > On Thu, Nov 09, 2023, David Matlack wrote: > > Paolo, it sounds like overall my proposal has limited value outside of > > GCE's use-case. And even if it landed upstream, it would bifrucate KVM > > VM post-copy support. So I think it's probably not worth pursuing > > further. Do you think that's a fair assessment? Getting a clear NACK > > on pushing this proposal upstream would be a nice outcome here since > > it helps inform our next steps. > > > > That being said, we still don't have an upstream solution for 1G > > post-copy, which James pointed out is really the core issue. But there > > are other avenues we can explore in that direction such as cleaning up > > HugeTLB (very nebulous) or adding 1G+mmap()+userfaultfd support to > > guest_memfd. The latter seems promising. > > mmap()+userfaultfd is the answer for userspace and vhost, but it is most defintiely > not the answer for guest_memfd within KVM. The main selling point of guest_memfd > is that it doesn't require mapping the memory into userspace, i.e. userfaultfd > can't be the answer for KVM accesses unless we bastardize the entire concept of > guest_memfd. > > And as I've proposed internally, the other thing related to live migration that I > think KVM should support is the ability to performantly and non-destructively freeze > guest memory, e.g. to allowing blocking KVM accesses to guest memory during blackout > without requiring userspace to destroy memslots to harden against memory corruption > due to KVM writing guest memory after userspace has taken the final snapshot of the > dirty bitmap. > > For both cases, KVM will need choke points on all accesses to guest memory. Once > the choke points exist and we have signed up to maintain them, the extra burden of > gracefully handling "missing" memory versus frozen memory should be relatively > small, e.g. it'll mainly be the notify-and-wait uAPI. To be honest, the choke points are a relatively small part of any KVM-based demand paging scheme. We still need (a)-(e) from my original email. > > Don't get me wrong, I think Google's demand paging implementation should die a slow, > horrible death. But I don't think userfaultfd is the answer for guest_memfd. I'm a bit confused. Yes, Google's implementation is not good, I said the same in my original email. But if userfaultfd is not the answer for guest_memfd, are you saying the KVM _does_ need a KVM-based demand paging UAPI like I proposed?