Re: RFC: A KVM-specific alternative to UserfaultFD

David Matlack <dmatlack@xxxxxxxxxx> · Thu, 9 Nov 2023 10:33:59 -0800

On Thu, Nov 9, 2023 at 9:58 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> On Thu, Nov 09, 2023, David Matlack wrote:
> > Paolo, it sounds like overall my proposal has limited value outside of
> > GCE's use-case. And even if it landed upstream, it would bifrucate KVM
> > VM post-copy support. So I think it's probably not worth pursuing
> > further. Do you think that's a fair assessment? Getting a clear NACK
> > on pushing this proposal upstream would be a nice outcome here since
> > it helps inform our next steps.
> >
> > That being said, we still don't have an upstream solution for 1G
> > post-copy, which James pointed out is really the core issue. But there
> > are other avenues we can explore in that direction such as cleaning up
> > HugeTLB (very nebulous) or adding 1G+mmap()+userfaultfd support to
> > guest_memfd. The latter seems promising.
>
> mmap()+userfaultfd is the answer for userspace and vhost, but it is most defintiely
> not the answer for guest_memfd within KVM.  The main selling point of guest_memfd
> is that it doesn't require mapping the memory into userspace, i.e. userfaultfd
> can't be the answer for KVM accesses unless we bastardize the entire concept of
> guest_memfd.
>
> And as I've proposed internally, the other thing related to live migration that I
> think KVM should support is the ability to performantly and non-destructively freeze
> guest memory, e.g. to allowing blocking KVM accesses to guest memory during blackout
> without requiring userspace to destroy memslots to harden against memory corruption
> due to KVM writing guest memory after userspace has taken the final snapshot of the
> dirty bitmap.
>
> For both cases, KVM will need choke points on all accesses to guest memory.  Once
> the choke points exist and we have signed up to maintain them, the extra burden of
> gracefully handling "missing" memory versus frozen memory should be relatively
> small, e.g. it'll mainly be the notify-and-wait uAPI.

To be honest, the choke points are a relatively small part of any
KVM-based demand paging scheme. We still need (a)-(e) from my original
email.

>
> Don't get me wrong, I think Google's demand paging implementation should die a slow,
> horrible death.   But I don't think userfaultfd is the answer for guest_memfd.

I'm a bit confused. Yes, Google's implementation is not good, I said
the same in my original email. But if userfaultfd is not the answer
for guest_memfd, are you saying the KVM _does_ need a KVM-based demand
paging UAPI like I proposed?