Re: RFC: A KVM-specific alternative to UserfaultFD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Nov 09, 2023 at 09:58:49AM -0800, Sean Christopherson wrote:
> guest_memfd isn't intended to be a wholesale replacement of VMA-based memory.
> IMO, use cases that want to dynamically manage guest memory should be firmly
> out-of-scope for guest_memfd.

I'm not sure whether that will keep true for a longer period (e.g. 5-10
years, or more?), but it makes sense to me for now, at least we don't
already decide to reimplement everything.

If the use case grows and CoCo will become the de-facto standard, hopefully
there's always possibility to refactor mm features that CoCo will need to
cooperate with gmemfd, I guess.

> 
> > Paolo, it sounds like overall my proposal has limited value outside of
> > GCE's use-case. And even if it landed upstream, it would bifrucate KVM
> > VM post-copy support. So I think it's probably not worth pursuing
> > further. Do you think that's a fair assessment? Getting a clear NACK
> > on pushing this proposal upstream would be a nice outcome here since
> > it helps inform our next steps.
> > 
> > That being said, we still don't have an upstream solution for 1G
> > post-copy, which James pointed out is really the core issue. But there
> > are other avenues we can explore in that direction such as cleaning up
> > HugeTLB (very nebulous) or adding 1G+mmap()+userfaultfd support to
> > guest_memfd. The latter seems promising.
> 
> mmap()+userfaultfd is the answer for userspace and vhost, but it is most defintiely
> not the answer for guest_memfd within KVM.  The main selling point of guest_memfd
> is that it doesn't require mapping the memory into userspace, i.e. userfaultfd
> can't be the answer for KVM accesses unless we bastardize the entire concept of
> guest_memfd.

Note that I don't think userfaultfd needs to be bound to VA, even if it is
for now..

> And as I've proposed internally, the other thing related to live migration that I
> think KVM should support is the ability to performantly and non-destructively freeze
> guest memory, e.g. to allowing blocking KVM accesses to guest memory during blackout
> without requiring userspace to destroy memslots to harden against memory corruption
> due to KVM writing guest memory after userspace has taken the final snapshot of the
> dirty bitmap.

Any pointer to this problem you're describing?  Why the userspace cannot
have full control of when to quiesce guest memory accesses (probably by
kicking all vcpus out)?

> For both cases, KVM will need choke points on all accesses to guest memory.  Once
> the choke points exist and we have signed up to maintain them, the extra burden of
> gracefully handling "missing" memory versus frozen memory should be relatively
> small, e.g. it'll mainly be the notify-and-wait uAPI.
> 
> Don't get me wrong, I think Google's demand paging implementation should die a slow,
> horrible death.   But I don't think userfaultfd is the answer for guest_memfd.

As I replied in the other thread, I see possibility implementing
userfaultfd on gmemfd, especially after I know your plan now treating
user/kernel the same way.

But I don't know whether I could have missed something here and there, and
I'd like to read the problem first on above to understand the relationship
between that "freeze guest mem" idea and the demand paging scheme.

One thing I'd agree is we don't necessarily need to squash userfaultfd into
gmemfd support of demand paging.  If gmemfd will only be used in KVM
context then indeed it at least won't make a major difference; but still
good if the messaging framework can be leveraged, meanwhile userspace that
already support userfaultfd can cooperate with gmemfd much easier.

In general, a major part of userfaultfd is really a messaging interface for
faults to me.  A fault trap mechanism will be needed anyway for gmemfd,
AFAIU. When that comes maybe we can have a clearer mind of what's next.

Thanks,

-- 
Peter Xu





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux